PV24 Schedule of Events
BackgroundDigital pathology has, in the past decade, seen a boom in machine learning (ML) models trained on histopathological images for diagnostic and assistive purposes [1]. ML models can assist pathologists on tasks of identifying morphological structures and provide a hedge against the issue of interobserver variability [2,3]. However, researchers must often ask pathologists to complete meticulous hand-drawn annotations for tens or hundreds of whole slide images (WSI) to generate the necessary data to train these models. The alternative is to label WSI at the slide level, which is less effort for pathologists, but is far less efficient when training ML models [5]. We propose a new method for generating patch-level labels which can be used as training data for ML vision models. We then applied this method to create two datasets, one for training a hematopoietic bone marrow (BM) detection model, and another for training a megakaryocyte detection model. Megakaryocyte identification suffers from interobserver variability [4], yet is crucial in accurately diagnosing diseases such as immune thrombocytopenic purpora (ITP), myeloproliferative disorders, and myelodysplastic syndromes (MDS) [6]. In a workflow featuring the models working in tandem, the first model will detect patches of hematopoietic BM tissue, which will then be fed into the megakaryocyte model to finally output an annotated WSI with predicted BM and megakaryocytes highlighted. MethodsWe developed an image sorting app, SortImg, which allows pathologists to easily sort histology image patches into pre-determined categories. WSI are tiled or patched; these patches then are automatically clustered by a clustering algorithm. SortImg is then used to purify these clusters to ensure that only the desired morphologies or cell types are included. These image patch sets can be sorted by multiple pathologists, which can be leveraged to reduce interobserver variability, i.e. by taking a majority vote, clustering pathologists to obtain varied features. For our case, we first extracted 256px x 256px patches from bone tissue WSI and sorted these patches into >50% hematopoietic BM (hematopoietic cells and adipocytes) vs. 'other' (including >50% bone, fibrosis, blood, or background). We then extracted 100px x 100px patches from the same WSIs and classified these as containing megakaryocytes vs. not, both using the SortImg app. We collected 2000 total patches for each set, which was then split into 80% train and 20% validation. We finetuned two ResNet101 models pretrained on ImageNet-1k for each dataset. ResultsThe BM detection model was able to differentiate between patches with >50% hematopoietic BM and 'other' patches with 98.25% accuracy, 0.9996 ROC AUC, and 0.9822 F1 score. On the validation set (n=400), the model was able to correctly classify all 200 negative patches, but had 7 false negatives out of the 200 true positive patches. For the megakaryocyte detection model, the model achieved 99.00% accuracy, 0.9978 ROC AUC, and 0.9899 F1 score. On the validation set (n=400), the model achieved 0 false positive predictions of 200 true negatives and 4 false negative predictions of 200 true positives.ConclusionsWe were able to train high-performance models to recognize hematopoietic BM tissue and megakaryocytes using a new pathologist-friendly workflow tool, SortImg. Labeling at the patch level is finer than at the slide level, allowing for computer vision models to learn morphology with greater specificity, while taking less time than meticulous hand-drawn annotations. This new method of labeling image patch sets could greatly improve dataset compilation workflows for researchers seeking to build assistive computer vision models for a wide range of tissues and diseases. In the current age of human oriented medicine, we may have surpassed the need for single ground truth labels, opting for a more nuanced multi-label approach. SortImg, with its multi-pathologist functionality, is a tool that is suited for such a task. References1. McGenity C, Clarke EL, Jennings C, et al. Artificial intelligence in digital pathology: a systematic review and meta-analysis of diagnostic test accuracy. Npj Digit Med. 2024;7(1):1-19. doi:10.1038/s41746-024-01106-82. Lami K, Bychkov A, Matsumoto K, et al. Overcoming the Interobserver Variability in Lung Adenocarcinoma Subtyping: A Clustering Approach to Establish a Ground Truth for Downstream Applications. Arch Pathol Lab Med. 2023;147(8):885-895. doi:10.5858/arpa.2022-0051-OA3. Homeyer A, Geißler C, Schwen LO, et al. Recommendations on compiling test datasets for evaluating artificial intelligence solutions in pathology. Mod Pathol. 2022;35(12):1759-1769. doi:10.1038/s41379-022-01147-y4. Wilkins BS, Erber WN, Bareford D, et al. Bone marrow pathology in essential thrombocythemia: interobserver reliability and utility for identifying disease subtypes. Blood. 2008;111(1):60-70. doi:10.1182/blood-2007-05-0918505. Lu MY, Williamson DFK, Chen TY, Chen RJ, Barbieri M, Mahmood F. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng. 2021;5(6):555-570. doi:10.1038/s41551-020-00682-w6. Kumar V. Robbins and Cotran Pathologic Basis of Disease. Saunders/Elsevier; 2010.