3.8 Proceedings Paper

Bagging of Xgboost Classifiers with Random Under-sampling and Tomek Link for Noisy Label-imbalanced Data

出版社

IOP PUBLISHING LTD
DOI: 10.1088/1757-899X/428/1/012004

关键词

-

资金

  1. National Science Foundation of China [NSFC51475391]
  2. University-Enterprise Cooperation Project [17H1199]

向作者/读者索取更多资源

Fitting label-imbalanced data with high level of noise is one of the major challenges in learning-based intelligent system design. In this paper, for the two-class problem, we propose a bagging-based algorithm with Xgboost classifier (Gradient Boosting Machine) and under-sampling approaches to overcome the challenge. To avoid model misspecification caused by imbalanced data, random sampling with replacement is employed to obtain several balanced training sets; and to mitigate the problem of misleading information produced by noise, Tomek Link method is introduced to eliminate the cross-class overlapped instances, which are the primal sources of noise. And to obtain robust individual learners, we utilize Xgboost, a novel Gradient Boosting Machine-based classifier with convenient parameter tuning interface, to fit each component of the bagging ensemble. The performance of the proposed method is tested with Mandarin radio records (MFCC features) with the task of keywords recognition, and experimental results show that the new method could outperform single Xgboost classifier, verified the rationality and effectiveness of the bagging scheme. The method proposed in the paper could offer a novel solution to the challenge of noisy imbalanced data classification, and the implementation of Xgboost in this area could also serve as an innovative work.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据