4.6 Article

The Machine Learning Model for Distinguishing Pathological Subtypes of Non-Small Cell Lung Cancer

Journal

FRONTIERS IN ONCOLOGY
Volume 12, Issue -, Pages -

Publisher

FRONTIERS MEDIA SA
DOI: 10.3389/fonc.2022.875761

Keywords

[F-18]F-FDG PET; CT; radiomics; lung adenocarcinoma; lung squamous cell carcinoma; machine learning

Categories

Ask authors/readers for more resources

Machine learning models were developed and validated to identify lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) using clinical factors, laboratory metrics, and radiomic features. The models showed good performance in distinguishing between LUAD and LUSC, providing a noninvasive predictive tool for clinical decision-making.
PurposeMachine learning models were developed and validated to identify lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) using clinical factors, laboratory metrics, and 2-deoxy-2[F-18]fluoro-D-glucose ([F-18]F-FDG) positron emission tomography (PET)/computed tomography (CT) radiomic features. MethodsOne hundred and twenty non-small cell lung cancer (NSCLC) patients (62 LUAD and 58 LUSC) were analyzed retrospectively and randomized into a training group (n = 85) and validation group (n = 35). A total of 99 feature parameters-four clinical factors, four laboratory indicators, and 91 [F-18]F-FDG PET/CT radiomic features-were used for data analysis and model construction. The Boruta algorithm was used to screen the features. The retained minimum optimal feature subset was input into ten machine learning to construct a classifier for distinguishing between LUAD and LUSC. Univariate and multivariate analyses were used to identify the independent risk factors of the NSCLC subtype and constructed the Clinical model. Finally, the area under the receiver operating characteristic curve (AUC) values, sensitivity, specificity, and accuracy (ACC) was used to validate the machine learning model with the best performance effect and Clinical model in the validation group, and the DeLong test was used to compare the model performance. ResultsBoruta algorithm selected the optimal subset consisting of 13 features, including two clinical features, two laboratory indicators, and nine PEF/CT radiomic features. The Random Forest (RF) model and Support Vector Machine (SVM) model in the training group showed the best performance. Gender (P=0.018) and smoking status (P=0.011) construct the Clinical model. In the validation group, the SVM model (AUC: 0.876, ACC: 0.800) and RF model (AUC: 0.863, ACC: 0.800) performed well, while Clinical model (AUC:0.712, ACC: 0.686) performed moderately. There was no significant difference between the RF and Clinical models, but the SVM model was significantly better than the Clinical model. ConclusionsThe proposed SVM and RF models successfully identified LUAD and LUSC. The results indicate that the proposed model is an accurate and noninvasive predictive tool that can assist clinical decision-making, especially for patients who cannot have biopsies or where a biopsy fails.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available