4.7 Article

Prediction of Long-Term Stroke Recurrence Using Machine Learning Models

Journal

JOURNAL OF CLINICAL MEDICINE
Volume 10, Issue 6, Pages -

Publisher

MDPI
DOI: 10.3390/jcm10061286

Keywords

healthcare; artificial intelligence; machine learning; interpretable machine learning; explainable machine learning; ischemic stroke; clinical decision support system; electronic health record; outcome prediction; recurrent stroke

Funding

  1. Defense Threat Reduction Agency (DTRA) [HDTRA1-18-1-0008]
  2. National Institute of Health (NIH) [R56HL116832]
  3. Bucknell University Initiative Program
  4. ROCHE-Genentech Biotechnology Company
  5. Geisinger Health Plan Quality fund

Ask authors/readers for more resources

Machine-learning models were trained to predict long-term stroke recurrence using patient-level data and interpretable algorithms, identifying important clinical features such as age, body mass index, and laboratory variables. Model performance could be optimized through different strategies to improve the balance between specificity and sensitivity.
Background: The long-term risk of recurrent ischemic stroke, estimated to be between 17% and 30%, cannot be reliably assessed at an individual level. Our goal was to study whether machine-learning can be trained to predict stroke recurrence and identify key clinical variables and assess whether performance metrics can be optimized. Methods: We used patient-level data from electronic health records, six interpretable algorithms (Logistic Regression, Extreme Gradient Boosting, Gradient Boosting Machine, Random Forest, Support Vector Machine, Decision Tree), four feature selection strategies, five prediction windows, and two sampling strategies to develop 288 models for up to 5-year stroke recurrence prediction. We further identified important clinical features and different optimization strategies. Results: We included 2091 ischemic stroke patients. Model area under the receiver operating characteristic (AUROC) curve was stable for prediction windows of 1, 2, 3, 4, and 5 years, with the highest score for the 1-year (0.79) and the lowest score for the 5-year prediction window (0.69). A total of 21 (7%) models reached an AUROC above 0.73 while 110 (38%) models reached an AUROC greater than 0.7. Among the 53 features analyzed, age, body mass index, and laboratory-based features (such as high-density lipoprotein, hemoglobin A1c, and creatinine) had the highest overall importance scores. The balance between specificity and sensitivity improved through sampling strategies. Conclusion: All of the selected six algorithms could be trained to predict the long-term stroke recurrence and laboratory-based variables were highly associated with stroke recurrence. The latter could be targeted for personalized interventions. Model performance metrics could be optimized, and models can be implemented in the same healthcare system as intelligent decision support for targeted intervention.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available