4.7 Article

On the effectiveness of preprocessing methods when dealing with different levels of class imbalance

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 25, Issue 1, Pages 13-21

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2011.06.013

Keywords

Imbalance; Resampling; Classification; Performance measures; Multi-dimensional scaling

Funding

  1. Spanish Ministry of Education and Science [CSD2007-00018, TIN2009-14205]
  2. Fundacio Caixa Castello - Bancaixa [P1-1B2009-04]

Ask authors/readers for more resources

The present paper investigates the influence of both the imbalance ratio and the classifier on the performance of several resampling strategies to deal with imbalanced data sets. The study focuses on evaluating how learning is affected when different resampling algorithms transform the originally imbalanced data into artificially balanced class distributions. Experiments over 17 real data sets using eight different classifiers, four resampling algorithms and four performance evaluation measures show that over-sampling the minority class consistently outperforms under-sampling the majority class when data sets are strongly imbalanced, whereas there are not significant differences for databases with a low imbalance. Results also indicate that the classifier has a very poor influence on the effectiveness of the resampling strategies. (C) 2011 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available