Journal
NEUROCOMPUTING
Volume 343, Issue -, Pages 3-18Publisher
ELSEVIER
DOI: 10.1016/j.neucom.2018.04.088
Keywords
Imbalanced learning; Over-sampling; Under-sampling; Noisy data
Categories
Funding
- Brazilian Research Council (CNPq) [132229/2016-1]
Ask authors/readers for more resources
Over-sampling algorithms are the most adopted approach to balance class distribution in imbalanced data problems, through random replication or synthesis of new examples in the minority class. Current over-sampling algorithms, however, usually use all available examples in the minority class to synthesise new instances, which may include noisy or outlier data. This work proposes k-INOS, a new algorithm to prevent over-sampling algorithms from being contaminated by noisy examples in the minority class. k-INOS is based on the concept of neighbourhood of influence and works as a wrapper around any oversampling algorithm. A comprehensive experimentation was conducted to test k-INOS in 50 benchmark data sets, 8 over-sampling methods and 5 classifiers, with performance measured according to 7 metrics and Wilcoxon signed-ranks test. Results showed, particularly for weak classifiers (but not only), k-INOS significantly improved the performance of over-sampling algorithms in most performance metrics. Further investigations also allowed to identify conditions where k-INOS is likely to increase performance, according to features and rates measured from the data sets. The extensive experimentation framework evidenced k-INOS as an efficient algorithm to be applied prior to over-sampling methods. (C) 2019 Elsevier B.V. All rights reserved.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available