☆ 4.7 Review

A comparison of random forest variable selection methods for classification prediction modeling

EXPERT SYSTEMS WITH APPLICATIONS (2019)

期刊

EXPERT SYSTEMS WITH APPLICATIONS

卷 134, 期 -, 页码 93-101

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.eswa.2019.05.028

关键词

Random forest; Variable selection; Feature reduction; Classification

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic Operations Research & Management Science

资金

National Institutes of Health National Center for Advancing Translational Sciences Grant [KL2 TR001421]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Random forest classification is a popular machine learning method for developing prediction models in many research settings. Often in prediction modeling, a goal is to reduce the number of variables needed to obtain a prediction in order to reduce the burden of data collection and improve efficiency. Several variable selection methods exist for the setting of random forest classification; however, there is a paucity of literature to guide users as to which method may be preferable for different types of datasets. Using 311 classification datasets freely available online, we evaluate the prediction error rates, number of variables, computation times and area under the receiver operating curve for many random forest variable selection methods. We compare random forest variable selection methods for different types of datasets (datasets with binary outcomes, datasets with many predictors, and datasets with imbalanced outcomes) and for different types of methods (standard random forest versus conditional random forest methods and test based versus performance based methods). Based on our study, the best variable selection methods for most datasets are Jiang's method and the method implemented in the VSURF R package. For datasets with many predictors, the methods implemented in the R packages varSelRF and Boruta are preferable due to computational efficiency. A significant contribution of this study is the ability to assess different variable selection techniques in the setting of random forest classification in order to identify preferable methods based on applications in expert and intelligent systems. (C) 2019 Elsevier Ltd. All rights reserved.

A comparison of random forest variable selection methods for classification prediction modeling

期刊

EXPERT SYSTEMS WITH APPLICATIONS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A comparison of random forest variable selection methods for classification prediction modeling

期刊

EXPERT SYSTEMS WITH APPLICATIONS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文