PV24 Schedule of Events
Pathology specialized foundation models have emerged as a powerful tool for representing histopathology images and facilitating development of downstream applications for a wide variety of use cases, using less task-specific data and computational resources as compared to traditional machine learning methods. However, one challenge that can potentially limit the utilization of these models stems from the fact that nearly all available foundation models rely on creating thousands to hundreds of thousands of cropped image 'patches' from each whole slide image (WSI). Processing these many patches per slide (and case) can be computationally expensive. In addition, most foundation model development and evaluation has focused on relatively high magnification image interpretation tasks with relatively less attention on lower magnification tasks, such as quality control or tissue type, even though these can also be a critical part of interpretation workflows. In this work we develop and evaluate foundation models for both high and low magnification patches and tasks, for which entire WSIs can be represented with orders of magnitude fewer patch embeddings. Specifically, we train models using patches across a range of magnifications, ranging from ~4mm2 per patch (256 x 256 pixels at ~16 microns per pixel or 0.625x) up to ~0.1mm2 per patch (~0.5 microns per pixel). For evaluation, we augment high resolution benchmark tasks with 'low resolution' tasks such as stain quality, specimen type, tissue type, and tumor grading. We use this set to evaluate models via linear probing on held out data. Area under the receiver operating characteristic curve (AUC) was calculated for individual tasks as well as averaged across tasks to help summarize findings. We find that training using either low resolution and high resolution patches results in models that generally perform better on tasks corresponding to the matched resolution (ie. training with low resolution patches results in better performance on low resolution tasks). Training with a combination of low resolution and high resolution patches resulted in performance on par with a low resolution model for low resolution tasks (average AUC across low resolution tasks of 0.897 for combined model and 0.900 for low res model) and on par with a high resolution model for high resolution tasks (average AUC across high resolution tasks of 0.928 vs. 0.935). We propose this type of 'pan-magnification model', that is flexible to the input image size and resolution, offers an important option to consider when choosing a WSI embedding strategy for optimizing performance across tasks of varying magnifications and enabling computational efficiency for lower resolution tasks.
Learning Objectives