☆ 4.6 Article

Boosting the performance of over-sampling algorithms through under-sampling the minority class

NEUROCOMPUTING (2019)

Journal

NEUROCOMPUTING

Volume 343, Issue -, Pages 3-18

Publisher

ELSEVIER

DOI: 10.1016/j.neucom.2018.04.088

Keywords

Imbalanced learning; Over-sampling; Under-sampling; Noisy data

Funding

Brazilian Research Council (CNPq) [132229/2016-1]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Over-sampling algorithms are the most adopted approach to balance class distribution in imbalanced data problems, through random replication or synthesis of new examples in the minority class. Current over-sampling algorithms, however, usually use all available examples in the minority class to synthesise new instances, which may include noisy or outlier data. This work proposes k-INOS, a new algorithm to prevent over-sampling algorithms from being contaminated by noisy examples in the minority class. k-INOS is based on the concept of neighbourhood of influence and works as a wrapper around any oversampling algorithm. A comprehensive experimentation was conducted to test k-INOS in 50 benchmark data sets, 8 over-sampling methods and 5 classifiers, with performance measured according to 7 metrics and Wilcoxon signed-ranks test. Results showed, particularly for weak classifiers (but not only), k-INOS significantly improved the performance of over-sampling algorithms in most performance metrics. Further investigations also allowed to identify conditions where k-INOS is likely to increase performance, according to features and rates measured from the data sets. The extensive experimentation framework evidenced k-INOS as an efficient algorithm to be applied prior to over-sampling methods. (C) 2019 Elsevier B.V. All rights reserved.

Boosting the performance of over-sampling algorithms through under-sampling the minority class

Journal

NEUROCOMPUTING

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Boosting the performance of over-sampling algorithms through under-sampling the minority class

Journal

NEUROCOMPUTING

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper