☆ 4.7 Article

A Systematic Approach for Variable Selection With Random Forests: Achieving Stable Variable Importance Values

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS (2017)

期刊

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS

卷 14, 期 11, 页码 1988-1992

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/LGRS.2017.2745049

关键词

Mean decrease in accuracy (MDA); mean decrease in Gini (MDG) index; random forest; variable reduction

类别

Geochemistry & Geophysics Engineering, Electrical & Electronic Remote Sensing Imaging Science & Photographic Technology

资金

Environment and Climate Change Canada
Defence Research and Development Canada

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Random Forests variable importance measures are often used to rank variables by their relevance to a classification problem and subsequently reduce the number of model inputs in high-dimensional data sets, thus increasing computational efficiency. However, as a result of the way that training data and predictor variables are randomly selected for use in constructing each tree and splitting each node, it is also well known that if too few trees are generated, variable importance rankings tend to differ between model runs. In this letter, we characterize the effect of the number of trees (ntree) and class separability on the stability of variable importance rankings and develop a systematic approach to define the number of model runs and/or trees required to achieve stability in variable importance measures. Results demonstrate that both a large ntree for a single model run, or averaged values across multiple model runs with fewer trees, are sufficient for achieving stable mean importance values. While the latter is far more computationally efficient, both the methods tend to lead to the same ranking of variables. Moreover, the optimal number of model runs differs depending on the separability of classes. Recommendations are made to users regarding how to determine the number of model runs and/or trees that are required to achieve stable variable importance rankings.

A Systematic Approach for Variable Selection With Random Forests: Achieving Stable Variable Importance Values

期刊

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Systematic Approach for Variable Selection With Random Forests: Achieving Stable Variable Importance Values

期刊

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文