4.5 Article

A synthetic neighborhood generation based ensemble learning for the imbalanced data classification

期刊

APPLIED INTELLIGENCE
卷 48, 期 8, 页码 2441-2457

出版社

SPRINGER
DOI: 10.1007/s10489-017-1088-8

关键词

Class imbalance problem; Ensemble learning; Synthetic neighborhood generation; Diversity; Classification

资金

  1. Science and Technology Supporting Program, Sichuan Province, China [2013GZX0138, 2014GZ0154]

向作者/读者索取更多资源

Constructing effective classifiers from imbalanced datasets has emerged as one of the main challenges in the data mining community, due to its increased prevalence in various real-world domains. Ensemble solutions are quite often applied in this field for their ability to provide better classification ability than single classifier. However, most existing methods adopt data sampling to train the base classifier on balanced datasets, but not to directly enhance the diversity. Thus, the performance of the final classifier can be limited. This paper suggests a new ensemble learning that can address the class imbalance problem and promote diversity simultaneously. Inspired by the localized generalization error model, this paper generates some synthetic samples located within some local area of the training samples, and trains the base classifiers with the union of original training samples and synthetic neighborhoods samples. By controlling the number of generated samples, the base classifiers can be trained with balanced datasets. Meanwhile, as the generated samples can extend different parts of the original input space and can be quite different from the original training samples, the obtained base classifiers are guaranteed to be accurate and diverse. A thorough experimental study on 36 benchmark datasets was performed, and the experimental results demonstrated that our proposed method can deliver significant better performance than the state-of-the-art ensemble solutions for the imbalanced problems.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据