☆ 4.7 Article

RFBoost: An improved multi-label boosting algorithm and its application to text categorisation

KNOWLEDGE-BASED SYSTEMS (2016)

期刊

KNOWLEDGE-BASED SYSTEMS

卷 103, 期 -, 页码 104-117

出版社

ELSEVIER

DOI: 10.1016/j.knosys.2016.03.029

关键词

RFBoost; Boosting; AdaBoost.MH; Text categorisation; Labeled Latent Dirichlet Allocation; Multi-label classification

类别

Computer Science, Artificial Intelligence

资金

Malaysia Ministry of Education [FRGS/1/2014/ICT02/UKM/01/1]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The AdaBoost.MH boosting algorithm is considered to be one of the most accurate algorithms for multi label classification. AdaBoost.MH works by iteratively building a committee of weak hypotheses of decision stumps. In each round of AdaBoost.MH learning, all features are examined, but only one feature is used to build a new weak hypothesis. This learning mechanism may entail a high. degree of computational time complexity, particularly in the case of a large-scale dataset. This paper describes a way to manage the learning complexity and improve the classification performance of AdaBoost.MH. We propose an improved version of AdaBoost.MH, called RFBoost. The weak learning in RFBoost is based on filtering a small fixed number of ranked features in each boosting round rather than using all features, as AdaBoost.MH does. We propose two methods for ranking the features: One Boosting Round and Labeled Latent Dirichlet Allocation (LLDA), a supervised topic model based on Gibbs sampling. Additionally, we investigate the use of LLDA as a feature selection method for reducing the feature space based on the maximal conditional probabilities of words across labels. Our experimental results on eight well-known benchmarks for multi-label text categorisation show that RFBoost is significantly more efficient and effective than the baseline algorithms. Moreover, the LLDA-based feature ranking yields the best performance for RFBoost. (C) 2016 Elsevier B.V. All rights reserved.

RFBoost: An improved multi-label boosting algorithm and its application to text categorisation

期刊

KNOWLEDGE-BASED SYSTEMS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

RFBoost: An improved multi-label boosting algorithm and its application to text categorisation

期刊

KNOWLEDGE-BASED SYSTEMS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文