Predicting metastatic castration-resistant prostate cancer (mCRPC) events with machine learning: model development and cross-validation


Background: Prostate cancer metastasis requires intensified castration therapy, and the transition from metastatic castration-sensitive prostate cancer (mCSPC) to biochemically and clinically progressive metastatic castration-resistant prostate cancer (mCRPC) represents a significant challenge for proper therapeutic intervention. Moreover, this transition puts men at high risk of receiving unnecessary intensified medication with high chemical & financial toxicity. The aim of the current study is to early predict the transition from mCSPC to mCRPC,  through applying machine learning (ML) to routine clinical and genomic data in order to accurately stratify patients with metastatic prostate cancer for appropriate therapy.

Methods: A total of 5 clinical, biochemical, and genomic features of 424 patients with mCSPC were retrospectively collected and analyzed to develop ML based algorithms. The genomic and biochemical features were defined as mutation counts, fraction genome alteration, and prostatic specific antigen (PSA), and the clinical features included Grade Group and metastatic tumor volume. Metastatic tumor volume was classified as high volume (≥4 bone metastases or visceral metastases) versus low volume. The progression to castration resistance was defined as the end point. A set of ML experiments (K-Nearest Neighbor, Support Vector Machine, Decision Tree, Random Forest, and Logistic Regression) with supervised binary classifiers were conducted. The performance of each model was computed on training, and testing sets, with 10-fold cross-validation method, and evaluated by area under the receiver operating characteristic curve (AUROC), accuracy, precision, sensitivity, F1 score, and positive predictive values using Python (v3.11.3). Correlation heat maps were applied to investigate the correlation of features with permutation analysis to assess the importance of predictors. All raw data included in the analysis are available on cBioPortal for cancer genomics database. The study is supported by the NIH-NICA-T32 for Next Generation Pathologists Program at our institution.

Results:  At 36 months follow-up, 54% (n=240) of the 424 patients included in the analyses progressed to mCRPC. Of those 240 patients, 57% (n=139) were included in the high metastatic volume group (n=213). In the low metastatic tumor volume group (n=211), 43% (n=101) developed castration resistance. Of the 424 patients, 45% (n=191) had a Grade Group 5 disease. Mean±SD values for age, and PSA levels were 65.55±09.17 years and 76.42±100.52 ng/mL, respectively. The logistic regression model had the highest accuracy with an AUROC of 0.69, recall (sensitivity) of 0.80, precision of 0.69, and F1 score of 0.75. Mean accuracy score for performance of KNN, SVC, Decision Tree, Random Forest, and logistic regression on 10 folds cross-validation were 0.59, 0.56, 0.56, 0.67 and 0.67, respectively.

Conclusions: Our study provides a proof of concept that machine learning logistic regression model trained with routine clinical and genomic data could predict development of metastatic castration-resistant prostate cancer events with good performance, and thus may provide guidance in patient stratification for proper therapeutic interventions to promote personalized medicine. To our knowledge, this is the first study to investigate the applicability of ML in stratifying mCSPC patients. Additional experiments are being conducted to expand and further validate our findings.



  1. Understand differences between 5 Machine Learning models (K-Nearest Neighbor, Support Vector Machine, Decision Tree, Random Forest, and Logistic Regression)
  2. Select important features for building algorithms & choose the best model as a pathologist (metastatic prostate cancer data as an example)
  3. Understand the important role of machine learning algorithms in the current era of personalized medicine and the role of pathologists to apply such algorithms on laboratory medicine data (molecular and chemistry) for stratifying patients to achieve better health outcomes.


Presented by:


Mohammad Alexanderani, MD

NIH T32- Fellow of Computational Pathology

Weill Cornell Medicine


Mohammad Alexanderani is a dedicated computational pathology fellow at Weill Cornell Medicine, excelling in the prestigious Physician Scientist Track. Recognized for his talents and dedication, he has been honored with the NIH-T32 award for next-generation pathologists. Mohammad’s primary mission is to develop cutting-edge machine learning algorithms for accurate cancer behavior prediction, utilizing Whole Slide Images and clinicogenomic data. Throughout his journey, Mohammad attributes his success to invaluable mentorship at WCM. As a recipient of numerous accolades, including the ASCP 40 Under40 honoree, AASLD Young Investigator Award, and USCAP Pathologist in Training Award, he is highly regarded in scientific circles. With a fervent commitment to pushing the boundaries of digital pathology, he aspires to become an NIH-funded computational pathologist investigator, impacting the field and improving countless lives of patients. Beyond work, Mohammad is an avid soccer and chess player. Additionally, he enjoys embarking on runs around the captivating island of Manhattan.