☆ 4.7 Article

Comparison of variable selection methods for clinical predictive modeling

INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS (2018)

期刊

INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS

卷 116, 期 -, 页码 10-17

出版社

ELSEVIER IRELAND LTD

DOI: 10.1016/j.ijmedinf.2018.05.006

关键词

Models; Statistical; Regression analysis; Machine learning; Data interpretation; Statistical; Electronic health records; Variable selection

类别

Computer Science, Information Systems Health Care Sciences & Services Medical Informatics

资金

National Heart, Lung, and Blood Institute [K08 HL121080]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Objective: Modern machine learning-based modeling methods are increasingly applied to clinical problems. One such application is in variable selection methods for predictive modeling. However, there is limited research comparing the performance of classic and modern for variable selection in clinical datasets. Materials and Methods: We analyzed the performance of eight different variable selection methods: four regression-based methods (stepwise backward selection using p-value and AIC, Least Absolute Shrinkage and Selection Operator, and Elastic Net) and four tree-based methods (Variable Selection Using Random Forest, Regularized Random Forests, Boruta, and Gradient Boosted Feature Selection). We used two clinical datasets of different sizes, a multicenter adult clinical deterioration cohort and a single center pediatric acute kidney injury cohort. Method evaluation included measures of parsimony, variable importance, and discrimination. Results: In the large, multicenter dataset, the modern tree-based Variable Selection Using Random Forest and the Gradient Boosted Feature Selection methods achieved the best parsimony. In the smaller, single-center dataset, the classic regression-based stepwise backward selection using p-value and AIC methods achieved the best parsimony. In both datasets, variable selection tended to decrease the accuracy of the random forest models and increase the accuracy of logistic regression models. Conclusions: The performance of classic regression-based and modern tree-based variable selection methods is associated with the size of the clinical dataset used. Classic regression-based variable selection methods seem to achieve better parsimony in clinical

Comparison of variable selection methods for clinical predictive modeling

期刊

INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS

出版社

ELSEVIER IRELAND LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Comparison of variable selection methods for clinical predictive modeling

期刊

INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS

出版社

ELSEVIER IRELAND LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文