4.7 Article

Development of a Multilevel Model to Identify Patients at Risk for Delay in Starting Cancer Treatment

Journal

JAMA NETWORK OPEN
Volume 6, Issue 8, Pages -

Publisher

AMER MEDICAL ASSOC
DOI: 10.1001/jamanetworkopen.2023.28712

Keywords

-

Ask authors/readers for more resources

In this study, a machine learning model incorporating electronic health record and social determinants of health data was developed and validated to estimate the likelihood of delays in starting cancer therapy. This is important for improving treatment outcomes for vulnerable populations.
Importance Delays in starting cancer treatment disproportionately affect vulnerable populations and can influence patients' experience and outcomes. Machine learning algorithms incorporating electronic health record (EHR) data and neighborhood-level social determinants of health (SDOH) measures may identify at-risk patients.Objective To develop and validate a machine learning model for estimating the probability of a treatment delay using multilevel data sources.Design, Setting, and Participants This cohort study evaluated 4 different machine learning approaches for estimating the likelihood of a treatment delay greater than 60 days (group least absolute shrinkage and selection operator [LASSO], bayesian additive regression tree, gradient boosting, and random forest). Criteria for selecting between approaches were discrimination, calibration, and interpretability/simplicity. The multilevel data set included clinical, demographic, and neighborhood-level census data derived from the EHR, cancer registry, and American Community Survey. Patients with invasive breast, lung, colorectal, bladder, or kidney cancer diagnosed from 2013 to 2019 and treated at a comprehensive cancer center were included. Data analysis was performed from January 2022 to June 2023.Exposures Variables included demographics, cancer characteristics, comorbidities, laboratory values, imaging orders, and neighborhood variables.Main Outcomes and Measures The outcome estimated by machine learning models was likelihood of a delay greater than 60 days between cancer diagnosis and treatment initiation. The primary metric used to evaluate model performance was area under the receiver operating characteristic curve (AUC-ROC).Results A total of 6409 patients were included (mean [SD] age, 62.8 [12.5] years; 4321 [67.4%] female; 2576 [40.2%] with breast cancer, 1738 [27.1%] with lung cancer, and 1059 [16.5%] with kidney cancer). A total of 1621 (25.3%) experienced a delay greater than 60 days. The selected group LASSO model had an AUC-ROC of 0.713 (95% CI, 0.679-0.745). Lower likelihood of delay was seen with diagnosis at the treating institution; first malignant neoplasm; Asian or Pacific Islander or White race; private insurance; and lacking comorbidities. Greater likelihood of delay was seen at the extremes of neighborhood deprivation. Model performance (AUC-ROC) was lower in Black patients, patients with race and ethnicity other than non-Hispanic White, and those living in the most disadvantaged neighborhoods. Though the model selected neighborhood SDOH variables as contributing variables, performance was similar when fit with and without these variables.Conclusions and Relevance In this cohort study, a machine learning model incorporating EHR and SDOH data was able to estimate the likelihood of delays in starting cancer therapy. Future work should focus on additional ways to incorporate SDOH data to improve model performance, particularly in vulnerable populations.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available