4.5 Article

Correlation and variable importance in random forests

期刊

STATISTICS AND COMPUTING
卷 27, 期 3, 页码 659-678

出版社

SPRINGER
DOI: 10.1007/s11222-016-9646-1

关键词

Random forests; Supervised learning; Variable importance; Variable selection

向作者/读者索取更多资源

This paper is about variable selection with the random forests algorithm in presence of correlated predictors. In high-dimensional regression or classification frameworks, variable selection is a difficult task, that becomes even more challenging in the presence of highly correlated predictors. Firstly we provide a theoretical study of the permutation importance measure for an additive regression model. This allows us to describe how the correlation between predictors impacts the permutation importance. Our results motivate the use of the recursive feature elimination (RFE) algorithm for variable selection in this context. This algorithm recursively eliminates the variables using permutation importance measure as a ranking criterion. Next various simulation experiments illustrate the efficiency of the RFE algorithm for selecting a small number of variables together with a good prediction error. Finally, this selection algorithm is tested on the Landsat Satellite data from the UCI Machine Learning Repository.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据