Subject to change.
Subject to change.
Nick Spies, MD is a medical director for the Applied Artificial Intelligence group within ARUP Laboratories' Institute for Research and Innovation, with additional roles in clinical chemistry and maternal serum screening. He is board-certified in Clinical Pathology and completed his residency at WashU in St. Louis. His research interests focus on how we can apply artificial intelligence applications to improve clinical laboratory operations and quality assurance.
Introduction: Whole-slide foundation models for anatomic pathology have demonstrated remarkable progress in recent years, delivering accurate predictions across a wide range of diagnostic and prognostic tasks. However, hematopathology has received less attention, with most large pathology foundation models including few, if any, slides from bone marrow or peripheral blood samples. The evaluation of bone marrow and peripheral blood specimens involves many domain-specific tasks. A typical hematopathologist will assess cases for cellularity, fibrosis, cellular differentiation and maturation, among other features. We hypothesized that a domain-specific foundation model would outperform generalist pathology models on common hematopathology tasks.Methods: We describe a large, generalist hematopathology model trained on 27,735 whole-slide images from 9,544 cases, encompassing approximately 165 million tiles. The model was trained on a mixture of sample types, including bone marrow core biopsies, bone marrow aspirate smears, peripheral blood smears, clots, and touch preps. It incorporated multiple different standard and immunohistochemistry stains. All slides were scanned at 40x (~0.25 μm/px) resolution, and tiles at 40x, 20x, 10x, and 5x magnifications were included in the training data. The dataset contained a 60/40 mix of tiles sampled from bone marrow core biopsies (95 million tiles, 18,326 slides) and individual leukocyte cells from bone marrow aspirates and peripheral blood smears (70 million tiles, 9,409 slides). The model itself is a Vision Transformer-Large, which we trained using the DINO v2 algorithm on a 8xH100 system for approximately 2 epochs.Results: We compared our model to Prov-GigaPath (Xu et al., 2024), Uni (Chen et al., 2024), and H-Optimus-0 (Saillard et al., 2024). On an in-house cellularity estimation task, our model outperformed others, achieving the lowest RMSE (7.71). For a fibrosis quantification task on reticulin-stained bone marrow cores, our model narrowly outperformed the others, yielding an overall RMSE of 0.52. On a public dataset of classified leukocytes from peripheral blood (Matek et al., 2021), our model delivered much higher accuracy than other models, with an overall F1 score of 0.79.Conclusion: Our model demonstrated consistently strong performance across multiple hematopathology tasks, indicating that pathology foundation models with a relatively narrow scope may outperform large generalist models on in-domain tasks. Future work will include expanding to additional hematopathology-specific tasks, as well as the inclusion of additional comparison models.
Learning Objectives
1. Understand the potential advantages of training narrowly-scoped models for higher performance in specialized domains
2. Understand the unique challenges of hematopathology foundation models
3. Understand the commonalities and differences of hematopathology-specific tasks compared to the anatomical pathology domain
As artificial intelligence becomes increasingly embedded in laboratory workflows, the challenge shifts from deployment to oversight. This session brings together real-world insights from institutions actively using AI-enabled analyses to explore how labs monitor performance, detect data drift, and manage out-of-distribution inputs. Speakers will share how QA/QC plans have evolved to accommodate AI, and discuss practical strategies for ensuring reliability, safety, and clinical relevance. Designed as a candid conversation between groups that have experience with running AI-enabled analyses in the lab, this session offers vendors and potential adopters a grounded view of what it takes to make AI work in practice—and what matters most when it does.