SchaumbergAndrew Schaumberg, PhD

Postdoctoral Fellow

Brigham & Women's Hospital, Harvard Medical School




Improving disease prediction with patient case metadata from social media



Background: Pathologists must rapidly provide a diagnosis for critical health issues. Some pathologists share anonymized patient cases on social media to ask colleagues for complementary opinions. What may machines learn from the timing, location, and other patient case metadata on social media?


Methods: From Twitter we curated 15,242 images from 6,800 tweets from 30 pathologists from 13 countries. We developed machine learning and deep learning models to (i) identify histopathology stains, (ii) discriminate between subspecialty-related tissue types (e.g. dermatopathology, genitourinary), and (iii) differentiate disease states. For disease prediction, we include attention mechanisms, clinical covariates, and metadata-derived covariates.


Results: Area Under Receiver Operating Characteristic (AUROC) was 0.805-0.996 across the three discriminative tasks. Disease state prediction AUROC was 0.805 with ensembles and deep learning, which improved to 0.822 without ensembles and with attention. We implemented a social media bot (@pathobot on Twitter) to use trained classifiers to aid pathologists in obtaining real-time feedback on challenging cases. The bot predicts disease state and lists similar cases across social media and PubMed.The classifiers found texture and tissue were important clinico-visual features of disease. Case metadata - e.g. regional physician density, weather patterns, and socioeconomic factors - improve disease prediction. For instance, the probability of infectious disease may be greater when monthly temperatures are lower or precipitation is higher because people tend to congregate indoors.


Conclusions: Our project has become an AI-guided globally distributed network of pathology experts that facilitates pathological diagnosis and brings expertise to underserved regions or hospitals.To our knowledge, we are the first to quantify the power of patient case metadata on social media to predict disease. This leads to critical insights into socio-environmental risk factors of disease. For instance, that physician density is the most important regional covariate of disease prediction reinforces our mission to cultivate a more connected world of pathologists, to virtually make more pathologists available at any time. Human Development Index is second most important, highlighting socioeconomic factors as disease drivers. We additionally suggest sustained mindfulness of how much is shared on social media.



  1. Appreciate social media as a source of patient case data for training deep learning algorithms in computational pathology
  2. Understand the rich patient context and metadata available on social media, to both understand what information is most important for predicting disease and limit over-disclosure of information.



Andrew Schaumberg is a Postdoctoral Fellow in Faisal Mahmood Lab at Brigham & Women’s Hospital and Harvard Medical School. In 2020, he received his Ph.D. from Weill Cornell and the Tri-Institutional Training Program in Computational Biology and Medicine. His Ph.D. was supervised by Thomas Fuchs and was funded by an F31 grant from the National Cancer Institute of NIH. His research interests in computational pathology include molecular drivers of cancer, the diagnostic process of pathologists working at the microscope, and crowd-sourced pathologist opinions on social media.

40+ Expert Presenters

View the PV22 agenda.


Register for #PathVisions22! 

October 16-18 | Las Vegas, NV


Support & Exhibit Opportunities

Trade Show Sold Out!