4.7 Article

Machine learning and natural language processing (NLP) approach to predict early progression to first-line treatment in real-world hormone receptor-positive (HR+)/HER2-negative advanced breast cancer patients

Journal

EUROPEAN JOURNAL OF CANCER
Volume 144, Issue -, Pages 224-231

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.ejca.2020.11.030

Keywords

Breast cancer; Hormone receptor positive; CDK4/6-inhibitors; machine learning; Natural language processing; Electronic health records

Categories

Funding

  1. MICINN-SPAIN [TIN2017-88728-C2-1R]
  2. Pfizer [PS20200]

Ask authors/readers for more resources

This study developed predictive models for response in HR+/HER2-negative metastatic breast cancer patients using machine learning methods, finding that models based on NLP free-text processing are slightly better than those based on manually extracted data.
Background: CDK4/6 inhibitors plus endocrine therapies are the current standard of care in the first-line treatment of HR+/HER2-negative metastatic breast cancer, but there are no well-established clinical or molecular predictive factors for patient response. In the era of personalised oncology, new approaches for developing predictive models of response are needed. Materials and methods: Data derived from the electronic health records (EHRs) of real-world patients with HR+/HER2-negative advanced breast cancer were used to develop predictive models for early and late progression to first-line treatment. Two machine learning approaches were used: a classic approach using a data set of manually extracted features from reviewed (EHR) patients, and a second approach using natural language processing (NLP) of freetext clinical notes recorded during medical visits. Results: Of the 610 patients included, there were 473 (77.5%) progressions to first-line treatment, of which 126 (20.6%) occurred within the first 6 months. There were 152 patients (24.9%) who showed no disease progression before 28 months from the onset of first-line treatment. The best predictive model for early progression using the manually extracted dataset achieved an area under the curve (AUC) of 0.734 (95% CI 0.687-0.782). Using the NLP free-text processing approach, the best model obtained an AUC of 0.758 (95% CI 0.714 -0.800). The best model to predict long responders using manually extracted data obtained an AUC of 0.669 (95% CI 0.608-0.730). With NLP free-text processing, the best model attained an AUC of 0.752 (95% CI 0.705-0.799). Conclusions: Using machine learning methods, we developed predictive models for early and late progression to first-line treatment of HR+/HER2-negative metastatic breast cancer, also finding that NLP-based machine learning models are slightly better than predictive models based on manually obtained data. (C) 2020 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available