Weakly-supervised learning is an effective method for assisting diagnosis/prognosis on gigapixel histopathological images


Introduction: Machine learning has demonstrated substantial success for disease diagnosis/prognosis via identifying histologic patterns on microscopic tissue sections. However, training models with manual annotations is extremely challenging due to large image sizes (billions of pixels) and the need for pathology expertise. Weakly-supervised/multiple-instance-learning (MIL) methods can mitigate this challenge by training using whole-slide-level labels, which can be directly obtained from standard clinical pathology reports without the need for pathologists' manual annotation. We proposed a vision transformer (ViT) based MIL method, called ViT-stack, which demonstrated state-of-the-art performance for two tasks using more than a thousand gigapixel whole-slide-image (WSI) of tissue sections: lymph node metastases detection for breast cancer and invasive tumor front inflammation score prediction for bladder cancer.

Data: Lymph node metastases: CAMELYON 16 and 17 [Bejnordi B E et al. 2017], which includes 899 WSIs from 370 patients, were used. Samples were obtained from 7 different centers and scanned at 40X (0.226μm/pixel or 0.243μm/pixel). Each slide was labeled as positive vs. negative. A 10-folder cross validation was performed.

Inflammation score: 105 cystectomy specimens were collected from an Ontario retrospective cohort of 105 patients. Slides were scanned at 40x (0.25 μm/pixel). Slide-level inflammation scores were provided by a clinical pathologist (Downes) as high vs. low. Data were split for training (54) and testing (51).

Methods: We used ViTs to extract feature representations at multiple resolutions, inspired by [Chen, R et al. 2022]. Instead of aggregating features using ViTs, we reserved all feature vectors at each resolution and concatenated them to reduce information loss. The final feature vectors were aggregated via an attention scheme for slide-level predictions. We validated our method for two tasks: lymph node metastases, to predict the WSI as positive vs. negative; for inflammation score prediction, to predict the WSI as high vs. low. The predicted inflammation scores were used as a co-variate to perform Cox-Hazard survival analysis. We also experimented with two widely used MIL methods (i.e. CLAM [Lu, M et al. 2021] and HIPT [Chen, R et al. 2022]) for comparison purposes.

Results: For inflammation score prediction, ViT-stack, CLAM, and HIPT have area-under-the-receiver-operation-characteristic-curves (AUCs) of 0.87, 0.85, and 0.82 respectively; for lymph node metastases detection, the AUCs are 0.95, 0.94 and 0.94 respectively.

For survival analysis, ViT-stack yields results for (p-value, C-index, and Hazard ratio) (0.002, 0.749, -0.752), which are the most comparable results to the reference standard (0.039, 0.749, -0.737). In comparison, CLAM and HIPT have results of (0.046, 0.731, -0.681) and (0.082, 0.720, -0.650).

Conclusion: Our study demonstrated that MIL is an effective learning method for predicting whole-slide-level outcomes without the need for manual annotations. We demonstrated the best performance and generalization capacity of our method in two different tasks using samples collected from different institutions. We can envision the potential usability of our methods on a large scale by leveraging retrospective digital pathology data and clinical reports for disease diagnosis/prognosis, and biomarker discovery.



  1. Understand clinical applications for multiple instance learning, which leverages clinical reports for model development for diagnosis/prognosis without the       need for pathologists’ annotation
  2. Understand basic concept for weakly-supervised/multiple-instance-learning for digital pathology applications
  3. Understand the concept/workflow on how to process very large dataset of gigapixel digital pathology images


Presented by:


Wenchao Han, PhD

Senior Associate Consultant

Mayo Clinic


Wenchao Han is currently a senior associate consultant, faculty member, in the Division of Computational Pathology and AI, Department of Laboratory Medicine and Pathology, Mayo Clinic. He finished his post-doc training from 2019 to 2023 in the Department of Medical Biophysics, University of Toronto. He received his PhD in Medial Biophysics in 2020 from University of Western Ontario. He has been working in the filed of digital pathology since 2013 and specialized in using image analysis for microscopic images (e.g. H&E, IHC, Multiplexed images).