Subject to change.
Subject to change.
Phoenix Yu Wilkie is a PhD candidate in the Department of Medical Biophysics at the University of Toronto. Her thesis research focuses on using weakly supervised learning models to predict the recurrence of ductal carcinoma in situ. She is passionate about making cutting-edge industry technologies accessible within academia. When not at her computer, Phoenix enjoys creating art and cultivating her garden.
Background:Clustering is used in weakly and self-supervised learning to group similar images of tissue samples together. Unsupervised clustering allows for exploration of the latent space which is beneficial in digital pathology classification tasks. There exist new and improved methods for clustering pathology images. Yet, it remains difficult to assess which clustering method would be most effective for certain datasets when designing experimental research plans.It is currently difficult to assess cluster quality quantitatively for histology patches. Standard clustering evaluation typically quantifies the inter- and intra-cluster distances. However, in histology-based analysis, the variations in morphological features are subtle, necessitating labelled downstream tasks for cluster validation.Dunn's Index (DI) is useful for comparing clustering algorithms or parameter settings by assessing their ability to generate compact, well-separated clusters. However, in pathology images, DI often yields values of 0 due to sensitivity to noise and outliers. Significant overlap and poor cluster separation are common when assessing clustering in uncurated patch datasets. Furthermore, metrics such as the average Silhouette Coefficient (aSC) frequently approach 0, suggesting that many mixed tissue patches tend to reside near or on the decision boundary between neighboring clusters.Therefore, we propose a visual dimensionality reduction software pipeline. This will allow rapid, visual assessment of clusters.Methods:A graphical user interface (GUI) application was developed to address histopathology clustering assessment challenges. This software pipeline facilitates rapid visual evaluation of clustering methods by enabling users to upload feature-extracted data and corresponding patches. Users can upload results from any clustering algorithm or utilize the built-in unsupervised clustering method provided by the program.The included unsupervised clustering method was pretrained on histology patches using the SIMCLR framework for contrastive self-supervised learning. This approach groups similar images while separating dissimilar ones. Hierarchical agglomerative k-means is then applied for further refinement.The software allows users to conduct the Elbow Test for optimal cluster number estimation and determine the minimum number of parent clusters in hierarchical clustering. It offers automated evaluation using established metrics and generates centroid patches as output. Users with ground truth labels can select from evaluation metrics such as Adjusted Rand Index, Normalized Mutual Information, Homogeneity, Completeness, and V-Measure. Internal validation metrics like aSC, DI, Davies-Bouldin Index (DBI), and Calinski-Harabasz Index (CHI) are available for users without ground truth labels.The GUI allows users to reduce dimensionality, orient plots, and access patches. Users can visualize patches in latent space using TSNE, PCA, Sammon, and UMAP in 2D or 3D. Interactive navigation enables users to right-click on data points to view associated images, facilitating manual inspection of clusters. This interactive visualization approach supports a comprehensive examination of clustered data.Results:Two downstream tasks were conducted using public and private datasets. Firstly, patches containing cross-sectional views of complete tubules were identified with 98% (+/-1.7). This was done using the public BreCaHAD dataset.Secondly, tissue classification, particularly adipose and blood patches, was performed on an unlabelled private dataset. Initial evaluation metrics included aSC = -0.02, DI = 0.0, DBI = 3.47, and CHI = 41.71. By retaining only the 60% of patches closest to centroids, metrics approached 1 for aSC, DI, and DBI, indicating tighter clusters and increased separation between neighbors. Consequently, the remaining patches contained a single tissue type per patch.Conclusion:Assessing latent space proximity involved examining patches from diverse tissue labels. Visual inspection revealed evident morphological similarities among patches within clusters. The GUI streamlines the selection of optimal clustering methods for experimental pipelines, enabling swift quantification using standard equations for assessing cluster quality. Moreover, qualitative assessment is facilitated through interactive exploration, i.e. clicking on points within graphical clusters displays corresponding histology patches. Additional evaluation metrics are currently being implemented and integrated into the GUI software.
Learning Objectives