4.6 Article

Boosting the performance of over-sampling algorithms through under-sampling the minority class

Journal

NEUROCOMPUTING
Volume 343, Issue -, Pages 3-18

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2018.04.088

Keywords

Imbalanced learning; Over-sampling; Under-sampling; Noisy data

Funding

  1. Brazilian Research Council (CNPq) [132229/2016-1]

Ask authors/readers for more resources

Over-sampling algorithms are the most adopted approach to balance class distribution in imbalanced data problems, through random replication or synthesis of new examples in the minority class. Current over-sampling algorithms, however, usually use all available examples in the minority class to synthesise new instances, which may include noisy or outlier data. This work proposes k-INOS, a new algorithm to prevent over-sampling algorithms from being contaminated by noisy examples in the minority class. k-INOS is based on the concept of neighbourhood of influence and works as a wrapper around any oversampling algorithm. A comprehensive experimentation was conducted to test k-INOS in 50 benchmark data sets, 8 over-sampling methods and 5 classifiers, with performance measured according to 7 metrics and Wilcoxon signed-ranks test. Results showed, particularly for weak classifiers (but not only), k-INOS significantly improved the performance of over-sampling algorithms in most performance metrics. Further investigations also allowed to identify conditions where k-INOS is likely to increase performance, according to features and rates measured from the data sets. The extensive experimentation framework evidenced k-INOS as an efficient algorithm to be applied prior to over-sampling methods. (C) 2019 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available