4.7 Article

Comparison of Machine Learning Methods With Traditional Models for Use of Administrative Claims With Electronic Medical Records to Predict Heart Failure Outcomes

Journal

JAMA NETWORK OPEN
Volume 3, Issue 1, Pages -

Publisher

AMER MEDICAL ASSOC
DOI: 10.1001/jamanetworkopen.2019.18962

Keywords

-

Funding

  1. Bayer AG
  2. NHLBI NIH HHS [K01 HL142847] Funding Source: Medline

Ask authors/readers for more resources

Key PointsQuestionCan prediction of patient outcomes in heart failure based on routinely collected claims data be improved with machine learning methods and incorporating linked electronic medical records? FindingsIn this prognostic study including records on 9502 patients, machine learning methods offered only limited improvement over logistic regression in predicting key outcomes in heart failure based on administrative claims. Inclusion of additional predictors from electronic medical records improved prediction for mortality, heart failure hospitalization, and loss in home days but not for high cost. MeaningModels based on claims-only predictors may achieve modest discrimination and accuracy in prediction of key patient outcomes in heart failure, and machine learning approaches and incorporation of additional predictors from electronic medical records may offer some improvement in risk prediction of select outcomes. ImportanceAccurate risk stratification of patients with heart failure (HF) is critical to deploy targeted interventions aimed at improving patients' quality of life and outcomes. ObjectivesTo compare machine learning approaches with traditional logistic regression in predicting key outcomes in patients with HF and evaluate the added value of augmenting claims-based predictive models with electronic medical record (EMR)-derived information. Design, Setting, and ParticipantsA prognostic study with a 1-year follow-up period was conducted including 9502 Medicare-enrolled patients with HF from 2 health care provider networks in Boston, Massachusetts (providers includes physicians, clinicians, other health care professionals, and their institutions that comprise the networks). The study was performed from January 1, 2007, to December 31, 2014; data were analyzed from January 1 to December 31, 2018. Main Outcomes and MeasuresAll-cause mortality, HF hospitalization, top cost decile, and home days loss greater than 25% were modeled using logistic regression, least absolute shrinkage and selection operation regression, classification and regression trees, random forests, and gradient-boosted modeling (GBM). All models were trained using data from network 1 and tested in network 2. After selecting the most efficient modeling approach based on discrimination, Brier score, and calibration, area under precision-recall curves (AUPRCs) and net benefit estimates from decision curves were calculated to focus on the differences when using claims-only vs claims+EMR predictors. ResultsA total of 9502 patients with HF with a mean (SD) age of 78 (8) years were included: 6113 from network 1 (training set) and 3389 from network 2 (testing set). Gradient-boosted modeling consistently provided the highest discrimination, lowest Brier scores, and good calibration across all 4 outcomes; however, logistic regression had generally similar performance (C statistics for logistic regression based on claims-only predictors: mortality, 0.724; 95% CI, 0.705-0.744; HF hospitalization, 0.707; 95% CI, 0.676-0.737; high cost, 0.734; 95% CI, 0.703-0.764; and home days loss claims only, 0.781; 95% CI, 0.764-0.798; C statistics for GBM: mortality, 0.727; 95% CI, 0.708-0.747; HF hospitalization, 0.745; 95% CI, 0.718-0.772; high cost, 0.733; 95% CI, 0.703-0.763; and home days loss, 0.790; 95% CI, 0.773-0.807). Higher AUPRCs were obtained for claims+EMR vs claims-only GBMs predicting mortality (0.484 vs 0.423), HF hospitalization (0.413 vs 0.403), and home time loss (0.575 vs 0.521) but not cost (0.249 vs 0.252). The net benefit for claims+EMR vs claims-only GBMs was higher at various threshold probabilities for mortality and home time loss outcomes but similar for the other 2 outcomes. Conclusions and RelevanceMachine learning methods offered only limited improvement over traditional logistic regression in predicting key HF outcomes. Inclusion of additional predictors from EMRs to claims-based models appeared to improve prediction for some, but not all, outcomes. This prognostic study compares several machine learning approaches with traditional logistic regression for development of predictive models for all-cause mortality, heart failure hospitalization, high cost, and loss in home time, among patients with heart failure.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available