PV24 Schedule of Events

FROG: an unsupervised deep learning framework for scalable and robust end-to-end IHC quantification

   Mon, Nov 4
   03:55PM - 04:15PM ET

Quantitative immunohistochemistry (IHC) readout is of importance for multiple disease types for accurate diagnosis, prognosis, and treatment guidance. Currently, most reporting is based on manual assessment which is time-consuming, labor intensive, and subject to inter and intra-observer variability.Artificial intelligence (AI) technology, especially deep neural networks, helps the quantification by 1) instance cell segmentation followed by counting the segmented single cells for final scoring (a powerful approach providing both ROI/case-level IHC score and instance cell segmentation, which facilitates results verification, interpretation, and the derivation of other IHC quantification); or 2) directly performing IHC score estimation at region-of-interest (ROI) or case-level. To achieve either, existing methods rely on supervised learning approach (including weakly-supervised method), necessitating extensive annotated data for model training. In addition, well-known challenges have been persisting in domain adaptation and generalization, where models often experience substantially reduced performance or complete failure when applied to out-of-distribution data (e.g., data drift, different tissue/stain type)-image data that differs from the samples used during training. Data drift, a known issue in practice, in IHC images can occur at any stage of tissue processing, staining, or image acquisition, often unnoticed. This poses a significant risk, as models may provide inferior results without user awareness, potentially leading to incorrect diagnoses. The primary solution is training (including retraining/fine-tuning) the model with out-of-distribution data. However, for supervised methods, especially for the single cell model, the need for handers of thousands cell-level annotation makes it labor intensive with substantially high operational cost, especially if frequently performed. Crucially, such approach does not resolve the issue of unnoticed data drift.To overcome these challenges, we designed and developed a novel deep network framework (called FROG), the first unsupervised deep learning framework for end-to-end IHC quantification without the need for any annotations, where the model can self-train on out-of- distribution data in an unsupervised manner. FROG learns to generate colored instance cell segmentation masks while simultaneously predicting cell center point and biomarker expressions for positively vs. negatively stained cells, made possible through our novel dual-branch generative network structure.We rigorously and comprehensively validated our model on multiple IHC quantification tasks for both case and cell-level quantification using internal, external, and public datasets. These datasets encompass multiple tissue types of breasts, lung, bladder, and prostate, and include multiple nuclear IHC markers: Ki67, Estrogen Receptor (ER), Progesterone Receptor (PR). We benchmarked FROG with the widely used model (Unet) and the state-of-the-art model (DeepLIIF).For case-level classification of Ki67-stained breast cancer, our model identified positive and negative cells to compute scores, using a 20% cutoff based on ASCO guidelines. We used 2,136 clinically signed-out cases (pathologists' diagnosis are used as for validation), constituting 678, 134 image patches, spanning 11 years from Mayo clinic, which training and testing data are split randomly for overall performance, and chronologically for the data drift situation. FROG achieved an F1 score (the harmonic mean of precision and recall) of 0.95, accuracy of 0.97, and AUC of 0.99 with random splits, substantially outperforming Unet and DeepLIIF. Under data drift conditions, FROG maintained an F1 score of 0.92, while Unet and DeepLIIF dropped to 0.63 and 0.66, respectively, highlighting FROG's robustness to data drift.For cell-level validation, we used two public datasets: BCDatasets (181,074 annotated Ki67 cells in 1,338 breast cancer image patches, 800 patches for training) and bladder and non-small cell lung cancer datasets (600 image patches). FROG matched DeepLIIF's performance (averaged about 5 counts/patch error), with Unet showing higher errors of additional 5 counts/patch for BCDatasets and 7 counts/patch for the bladder and lung datasets. The public datasets have limited sample sizes, we anticipate even better performance for FROG with a sample size exceeding 2000 image patches, as evidenced by our sample size experiments on other datasets with larger sample sizes. Note that FROG did not use any annotation during training, while the benchmark methods (Unet and DeepLIIF) used hundreds of thousands cell-level annotations for similar or inferior performance.The third task was performed qualitatively by manually inspecting the model identified positively and negatively expressed cells. We performed this task on ER and PR-stained breast cancer tissue from a cohort of 563 cases; Ki67- stained bladder, lung and prostate cancer tissue from a cohort of 300 cases, which includes internal and external consulting cases. The results showed consistent and reliable performance across different tissue and IHC stain types.In conclusion, we proposed a groundbreaking paradigm providing a way for unsupervised learning for cell-level quantification for IHC image by self-training without any annotations, thereby overcoming significant challenges in domain adaptation and generalization. Through rigorous validation across diverse datasets and tasks, FROG consistently outperformed benchmark models, demonstrating superior to state-of-the-art performance, robustness, and generalizability, which supports accurate, efficient and reliable clinical implementation at scale.

2024 Pathology Visions

REGISTER NOW
Chat bot