4.7 Article

Adaptive ensemble of classifiers with regularization for imbalanced data classification

期刊

INFORMATION FUSION
卷 69, 期 -, 页码 81-102

出版社

ELSEVIER
DOI: 10.1016/j.inffus.2020.10.017

关键词

Adaptive ensemble; Gradient boosting machines; Regularization; Imbalanced data classification

资金

  1. Sichuan Science and Technology Program, China [2020YFG0051]
  2. UniversityEnterprise Cooperation Projects, China [17H1199, 19H0355, 19H1121]

向作者/读者索取更多资源

The study introduces a novel dynamic ensemble method AER to address the overfitting issue in binary imbalanced data classification through regularization and utilizing global geometry of data, demonstrating superior performance in experiments.
The dynamic ensemble selection of classifiers is an effective approach for processing label-imbalanced data classifications. However, such a technique is prone to overfitting, owing to the lack of regularization methods and the dependence on local geometry of data. In this study, focusing on binary imbalanced data classification, a novel dynamic ensemble method, namely adaptive ensemble of classifiers with regularization (AER), is proposed, to overcome the stated limitations. The method solves the overfitting problem through a new perspective of implicit regularization. Specifically, it leverages the properties of stochastic gradient descent to obtain the solution with the minimum norm, thereby achieving regularization; furthermore, it interpolates the ensemble weights by exploiting the global geometry of data to further prevent overfitting. According to our theoretical proofs, the seemingly complicated AER paradigm, in addition to its regularization capabilities, can actually reduce the asymptotic time and memory complexities of several other algorithms. We evaluate the proposed AER method on seven benchmark imbalanced datasets from the UCI machine learning repository and one artificially generated GMM-based dataset with five variations. The results show that the proposed algorithm outperforms the major existing algorithms based on multiple metrics in most cases, and two hypothesis tests (McNemar?s and Wilcoxon tests) verify the statistical significance further. In addition, the proposed method has other preferred properties such as special advantages in dealing with highly imbalanced data, and it pioneers the researches on regularization for dynamic ensemble methods.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据