4.7 Article

Machine Learning Approaches to Predict 6-Month Mortality Among Patients With Cancer

Journal

JAMA NETWORK OPEN
Volume 2, Issue 10, Pages -

Publisher

AMER MEDICAL ASSOC
DOI: 10.1001/jamanetworkopen.2019.15997

Keywords

-

Funding

  1. National Institutes of Health [T32-GM075766-14]
  2. Penn Center for Precision Medicine

Ask authors/readers for more resources

IMPORTANCE Machine learning algorithms could identify patients with cancer who are at risk of short-term mortality. However, it is unclear how different machine learning algorithms compare and whether they could prompt clinicians to have timely conversations about treatment and end-of-life preferences. OBJECTIVES To develop, validate, and compare machine learning algorithms that use structured electronic health record data before a clinic visit to predict mortality among patients with cancer. DESIGN, SETTING, AND PARTICIPANTS Cohort study of 26 525 adult patients who had outpatient oncology or hematology/oncology encounters at a large academic cancer center and 10 affiliated community practices between February 1, 2016, and July 1, 2016. Patients were not required to receive cancer-directed treatment. Patients were observed for up to 500 days after the encounter. Data analysis took place between October 1, 2018, and September 1, 2019. EXPOSURES Logistic regression, gradient boosting, and random forest algorithms. MAIN OUTCOMES AND MEASURES Primary outcome was 180-day mortality from the index encounter; secondary outcome was 500-day mortality from the index encounter. RESULTS Among 26 525 patients in the analysis, 1065 (4.0%) died within 180 days of the index encounter. Among those who died, the mean age was 67.3 (95% CI, 66.5-68.0) years, and 500 (47.0%) were women. Among those who were alive at 180 days, the mean age was 61.3 (95% CI, 61.1-61.5) years, and 15 922 (62.5%) were women. The population was randomly partitioned into training (18 567 [70.0%]) and validation (7958 [30.0%]) cohorts at the patient level, and a randomly selected encounter was included in either the training or validation set. At a prespecified alert rate of 0.02, positive predictive values were higher for the random forest (51.3%) and gradient boosting (49.4%) algorithms compared with the logistic regression algorithm (44.7%). There was no significant difference in discrimination among the random forest (area under the receiver operating characteristic curve [AUC], 0.88; 95% CI, 0.86-0.89), gradient boosting (AUC, 0.87; 95% CI, 0.85-0.89), and logistic regression (AUC, 0.86; 95% CI, 0.84-0.88) models (P for comparison = .02). In the random forest model, observed 180-day mortality was 51.3%(95% CI, 43.6%-58.8%) in the high-risk group vs 3.4% (95% CI, 3.0%-3.8%) in the low-risk group; at 500 days, observed mortality was 64.4% (95% CI, 56.7%-71.4%) in the high-risk group and 7.6% (7.0%-8.2%) in the low-risk group. In a survey of 15 oncology clinicians with a 52.1% response rate, 100 of 171 patients (58.8%) who had been flagged as having high risk by the gradient boosting algorithm were deemed appropriate for a conversation about treatment and end-of-life preferences in the upcoming week. CONCLUSIONS AND RELEVANCE In this cohort study, machine learning algorithms based on structured electronic health record data accurately identified patients with cancer at risk of short-term mortality. When the gradient boosting algorithm was applied in real time, clinicians believed that most patients who had been identified as having high risk were appropriate for a timely conversation about treatment and end-of-life preferences.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available