☆ 3.8 Proceedings Paper

BROOF: Exploiting Out-of-Bag Errors, Boosting and Random Forests for Effective Automated Classification

SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (2015)

Journal

SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL

Volume -, Issue -, Pages 353-362

Publisher

ASSOC COMPUTING MACHINERY

DOI: 10.1145/2766462.2767747

Keywords

Classification; Random Forests; Boosting

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Random Forests (RF) and Boosting are two of the most successful supervised learning paradigms for automatic classification. In this work we propose to combine both strategies in order to exploit their strengths while simultaneously solving some of their drawbacks, especially when applied to high-dimensional and noisy classification tasks. More specifically, we propose a boosted version of the RE classifier (BROOF), which fits an additive model composed by several random forests (as weak learners). Differently from traditional boosting methods which exploit the training error estimate, we here use the stronger out-of-bag (OOB) error estimate which is an out-of-the-box estimate naturally produced by the bagging method used in RFs. The influence of each weak learner in the fitted additive model is inversely proportional to their OOB error. Moreover, the probability of selecting an out-of-bag training example is increased if misclassified by the simpler weak learners, in order to enable the boosted model to focus on complex regions of the input space. We also adopt a selective weight updating procedure, whereas only the out-of-bag examples are updated as the boosting iterations go by. This serves the purpose of slowing down the tendency to focus on just, a few hard-to-classify examples. By mitigating this undesired bias known to affect boosting algorithms under high dimensional and noisy scenarios due to both the selective weighting schema and a proper weak-learner effectiveness assessment we greatly improve classification effectiveness. Our experiments with several datasets in three representative high-dimensional and noisy domains topic, sentiment and microarray data classification and up to ten state-of-the-art classifiers (covering almost 500 results), show that BROOF is the only classifier to be among the top performers in all tested datasets from the topic classification domain, and in the vast majority of cases in sentiment and microarray domains, a surprising result given the knowledge that there is no single top-notch classifier for all datasets.

BROOF: Exploiting Out-of-Bag Errors, Boosting and Random Forests for Effective Automated Classification

Journal

SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

BROOF: Exploiting Out-of-Bag Errors, Boosting and Random Forests for Effective Automated Classification

Journal

SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper