Bridging the Gap: Building Accessible Datasets for Cancer Research

 

In many academic centers, both cancer researchers and clinicians often face challenges due to limited accessibility of clinical data and biospecimens. However, with substantial accumulation of diverse data and advancements in data extraction tools in the past decade, it is now possible to address this issue.

Since the implementation of Epic in 2011 and digital pathology in 2019, the Ohio State University (OSU) has treated over 155,000 cancer patients, with 40,000 of them having digital slides and many more with tumor blocks available. This surpasses the size of the most commonly used cancer dataset, The Cancer Genome Atlas Program (TCGA), which comprises 11,000 patients with no biospecimen accessibility.

With the assistance of the OSU Epic reporting team and computer/data scientists, we have recently developed a Power BI dataset that can be regularly updated. The dataset allows users to search and compile cancer patients based on tumor types or organ systems since 2011, or specific cancer patients with digital slides since 2019. Two identical datasets have been created: one indexed with only research IDs for basic cancer researchers, and the other indexed with both research IDs and MRNs for the clinical team. The dataset indexed by research ID only can also be shared and merged with other institutions to further expand the dataset. This interactive tool has gained significant attention and interest from the OSU cancer research community. It not only enables users to gather a list of patients with a specific tumor type but also allows them to search for multiple parameters simultaneously. For example, by searching for breast cancer and metastatic brain tumors, oncologist can easily identify the right patients for clinical trial while pathologist can look for unique morphologic features to predict brain metastasis by using image analysis tools.

Nonetheless, the current dataset is limited to tumor type and organ system only, and the data extracted may not be exhaustive and could potentially miss some patients. Furthermore, the dataset merely serves as a tool to facilitate cancer research, and all other regulatory requirements, such as IRB approval, are still necessary for a particular study. Our future goal is to expand the dataset to include additional clinical data, such as molecular results, treatment details, prognosis, outcomes, and more. By integrating digital pathology images with clinical data, it enables artificial intelligence (AI)-empowered image analysis to generate morphologic pattern recognition and develop new tools to aid in diagnosis, gain new knowledge for cancer treatment and prediction, reduce costs by replacing expensive stains and molecular tests, and develop new apps to improve work efficiency.

In summary, building cancer research datasets in academic centers is necessary and feasible given the accumulation of diverse data types and the availability of advanced data extraction tools and data modeling. The deidentified dataset also permits interinstitutional data sharing and merging, which will elevate cancer research and discovery to new levels and advance the entire field at a faster pace.

 

Objectives:

  1. Building cancer research datasets in academic centers is necessary for translational research.
  2. With the accumulation of diverse data types and the availability of advanced data extraction tools and data modeling, it is doable.
  3. The deidentified dataset also permits interinstitutional data sharing and merging, which will elevate cancer research and discovery to new levels.

 

Presented by:

 

Shaoli Sun, MD

Pathologist

The Ohio State University

 

Dr. Shaoli Sun received her MD from Zhengzhou University School of Medicine in China, finished AP/CP residency in Icahn School of Medicine at Mount Sinai in New York City in 1997, and became a gastrointestinal (GI) pathologist after learning GI pathology from the most knowledgeable and inspiring mentor Dr. Rodger Haggitt. After practicing GI pathology for more than 20 years and working in a fully digitized lab, she felt so privileged to be a user of digital pathology for daily work. Seeing how much digital pathology can do inspired her to explore and develop more tools for pathologists to use. It is her dream to bring artificial intelligence into pathologists' practice to improve patient care.