4.6 Article

A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas

Journal

CANCERS
Volume 14, Issue 19, Pages -

Publisher

MDPI
DOI: 10.3390/cancers14194664

Keywords

machine learning; epidermal growth factor; radiomics; lung adenocarcinoma

Categories

Funding

  1. Science Foundation of Hunan Province [2022JJ70072, 2021JJ70151]
  2. Clinical medical technology innovation guidance project of Hunan Province [2020SK53706]
  3. China Medical Education Association 2020 major scientific problems and medical technology problems [2020KTS010]
  4. National Multidisciplinary Cooperative Diagnosis and Treatment Capacity Building Project for Major Disease (Lung Cancer) [z027002]

Ask authors/readers for more resources

This study aimed to find a novel data-driven genetic testing method that can effectively predict the mutation status of EGFR based on a prediction model combining clinical features. The results showed that the subset of radiographic and clinical features selected by feature engineering effectively predicted the EGFR mutation status of this NSCLC patient cohort.
Simple Summary Targeted therapy against epidermal growth factor (EGFR) mutations has become the standard of care for non-small cell lung cancer, and there has not been an efficient genetic test for non-small cell lung cancer patients. The present study aims to find a novel data-driven genetic testing method that can effectively predict the mutation status of EGFR based on a prediction model combining clinical features. The results of this study provide a powerful theoretical basis for the establishment of an effective mutation prediction model. The prediction model can provide a high reference value aiding in EGFR mutation diagnosis and subsequent treatment course. Data from 758 patients with lung adenocarcinoma were retrospectively collected. All patients had undergone computed tomography imaging and EGFR gene testing. Radiomic features were extracted using the medical imaging tool 3D-Slicer and were combined with the clinical features to build a machine learning prediction model. The high-dimensional feature set was screened for optimal feature subsets using principal component analysis (PCA) and the least absolute shrinkage and selection operator (LASSO). Model prediction of EGFR mutation status in the validation group was evaluated using multiple classifiers. We showed that six clinical features and 622 radiomic features were initially collected. Thirty-one radiomic features with non-zero correlation coefficients were obtained by LASSO regression, and 24 features correlated with label values were obtained by PCA. The shared radiomic features determined by these two methods were selected and combined with the clinical features of the respective patient to form a subset of features related to EGFR mutations. The full dataset was partitioned into training and test sets at a ratio of 7:3 using 10-fold cross-validation. The area under the curve (AUC) of the four classifiers with cross-validations was: (1) K-nearest neighbor (AUCmean = 0.83, Acc = 81%); (2) random forest (AUCmean = 0.91, Acc = 83%); (3) LGBM (AUCmean = 0.94, Acc = 88%); and (4) support vector machine (AUCmean = 0.79, Acc = 83%). In summary, the subset of radiographic and clinical features selected by feature engineering effectively predicted the EGFR mutation status of this NSCLC patient cohort.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available