4.1 Article

Robustness of random forests for regression

Journal

JOURNAL OF NONPARAMETRIC STATISTICS
Volume 24, Issue 4, Pages 993-1006

Publisher

TAYLOR & FRANCIS LTD
DOI: 10.1080/10485252.2012.715161

Keywords

random forest; quantile regression forest; robustness; median; ranks; least-absolute deviations

Funding

  1. Natural Sciences and Engineering Research Council of Canada (NSERC)
  2. Le Fonds quebecois de la recherche sur la nature et les technologies (FQRNT)

Ask authors/readers for more resources

In this paper, we empirically investigate the robustness of random forests for regression problems. We also investigate the performance of six variations of the original random forest method, all aimed at improving robustness. These variations are based on three main ideas: (1) robustify the aggregation method, (2) robustify the splitting criterion and (3) taking a robust transformation of the response. More precisely, with the first idea, we use the median (or weighted median), instead of the mean, to combine the predictions from the individual trees. With the second idea, we use least-absolute deviations from the median, instead of least-squares, as splitting criterion. With the third idea, we build the trees using the ranks of the response instead of the original values. The competing methods are compared via a simulation study with artificial data using two different types of contaminations and also with 13 real data sets. Our results show that all three ideas improve the robustness of the original random forest algorithm. However, a robust aggregation of the individual trees is generally more profitable than a robust splitting criterion.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.1
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available