4.6 Article

Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer

期刊

FRONTIERS IN ONCOLOGY
卷 12, 期 -, 页码 -

出版社

FRONTIERS MEDIA SA
DOI: 10.3389/fonc.2022.924144

关键词

EGFR mutation; lung cancer; prediction; machine learning; SHAP value

类别

资金

  1. National Natural Science Foundation of China [92159302, 81871890, 91859203]

向作者/读者索取更多资源

The objective of this study is to determine if clinical features and blood markers can establish an explainable machine learning model to predict EGFR mutation in lung cancer. The researchers analyzed data using various machine learning algorithms and identified key features for prediction. The study showed promising results and highlighted the importance of artificial intelligence in diagnosing and treating lung cancer.
ObjectivesThe aim of this study is to determine whether the clinical features including blood markers can establish an explainable machine learning model to predict epidermal growth factor receptor (EGFR) mutation in lung cancer. MethodsWe retrospectively analyzed 7,413 patients with lung adenocarcinoma (LA) diagnosed by gene sequencing in West China Hospital of the Sichuan University from April 2015 to June 2019. The machine learning algorithms (MLAs) included logistic regression (LR), random forest (RF), LightGBM, support vector machine (SVM), multi-layer perceptron (MLP), extreme gradient boosting (XGBoost), and decision tree (DT). Demographic characteristics, personal history, and blood markers were taken into. The area under the receiver operating characteristic curve (AUC) and SHapley Additive exPlanation (SHAP) value were used to explain the prediction models. ResultsOf the 7,413 patients with LA (47.6%), 3,527 were identified with EGFR mutation; RF achieved greatest performance in predicting EGFR mutation AUC [0.771, 95% confidence interval (CI): 0.770, 0.772], which was like XGBoost with AUC (0.740, 95% CI: 0.739, 0.741). The five most influential features were smoking consumption, sex, cholesterol, age, and albumin globulin ratio. The SHAP summary and dependence plot have been used to explain the affection of the 12 features to this model and how a single feature influences the output, respectively. ConclusionWe established EGFR mutation prediction models by MLAs and revealed that the RF was preferred, AUC (0.771, 95% CI: 0.770, 0.772), which was better than the traditional models. Therefore, the artificial intelligence-based MLA predicting model may become a practical tool to guide in diagnosis and therapy of LA.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据