☆ 4.2 Article

A bias-variance analysis of state-of-the-art random forest text classifiers

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION (2021)

期刊

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION

卷 15, 期 2, 页码 379-405

出版社

SPRINGER HEIDELBERG

DOI: 10.1007/s11634-020-00409-4

关键词

Random forests; Text classification; Bias variance analysis

类别

Statistics & Probability

资金

CAPES
CNPq
Finep
Fapemig
MasWeb
InWeb

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The study analyzed variants of random forest (RF) classifiers in the case of noisy data, exploring the bias-variance decomposition of error rate and showing significant improvements in variance and bias stability for lazy and boosted RF variants. The research provides promising directions for further enhancements in RF-based learners.

Random forest (RF) classifiers do excel in a variety of automatic classification tasks, such as topic categorization and sentiment analysis. Despite such advantages, RF models have been shown to perform poorly when facing noisy data, commonly found in textual data, for instance. Some RF variants have been proposed to provide better generalization capabilities under such challenging scenario, including lazy, boosted and randomized forests, all which exhibit significant reductions on error rate when compared to the traditional RFs. In this work, we analyze the behavior of such variants under the bias-variance decomposition of error rate. Such an analysis is of utmost importance to uncover the main causes of the observed improvements enjoyed by those variants in classification effectiveness. As we shall see, significant reductions in variance along with stability in bias explain a large portion of the improvements for the lazy and boosted RF variants. Such an analysis also sheds light on new promising directions for further enhancements in RF-based learners, such as the introduction of new randomization sources on both, lazy and boosted variants.

A bias-variance analysis of state-of-the-art random forest text classifiers

期刊

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION

出版社

SPRINGER HEIDELBERG

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A bias-variance analysis of state-of-the-art random forest text classifiers

期刊

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION

出版社

SPRINGER HEIDELBERG

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文