3.8 Proceedings Paper

BROOF: Exploiting Out-of-Bag Errors, Boosting and Random Forests for Effective Automated Classification

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/2766462.2767747

Keywords

Classification; Random Forests; Boosting

Ask authors/readers for more resources

Random Forests (RF) and Boosting are two of the most successful supervised learning paradigms for automatic classification. In this work we propose to combine both strategies in order to exploit their strengths while simultaneously solving some of their drawbacks, especially when applied to high-dimensional and noisy classification tasks. More specifically, we propose a boosted version of the RE classifier (BROOF), which fits an additive model composed by several random forests (as weak learners). Differently from traditional boosting methods which exploit the training error estimate, we here use the stronger out-of-bag (OOB) error estimate which is an out-of-the-box estimate naturally produced by the bagging method used in RFs. The influence of each weak learner in the fitted additive model is inversely proportional to their OOB error. Moreover, the probability of selecting an out-of-bag training example is increased if misclassified by the simpler weak learners, in order to enable the boosted model to focus on complex regions of the input space. We also adopt a selective weight updating procedure, whereas only the out-of-bag examples are updated as the boosting iterations go by. This serves the purpose of slowing down the tendency to focus on just, a few hard-to-classify examples. By mitigating this undesired bias known to affect boosting algorithms under high dimensional and noisy scenarios due to both the selective weighting schema and a proper weak-learner effectiveness assessment we greatly improve classification effectiveness. Our experiments with several datasets in three representative high-dimensional and noisy domains topic, sentiment and microarray data classification and up to ten state-of-the-art classifiers (covering almost 500 results), show that BROOF is the only classifier to be among the top performers in all tested datasets from the topic classification domain, and in the vast majority of cases in sentiment and microarray domains, a surprising result given the knowledge that there is no single top-notch classifier for all datasets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available