4.7 Article

EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling

Journal

PATTERN RECOGNITION
Volume 46, Issue 12, Pages 3460-3471

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2013.05.006

Keywords

Classification; Imbalanced data-sets; Ensembles; Class distribution; Kappa-error diagrams; Boosting

Funding

  1. Spanish Ministry of Education and Science [TIN2010-15055, TIN2011-28488]
  2. Andalusian Research Plan [P10-TIC-6858]

Ask authors/readers for more resources

Classification with imbalanced data-sets has become one of the most challenging problems in Data Mining. Being one class much more represented than the other produces undesirable effects in both the learning and classification processes, mainly regarding the minority class. Such a problem needs accurate tools to be undertaken; lately, ensembles of classifiers have emerged as a possible solution. Among ensemble proposals, the combination of Bagging and Boosting with preprocessing techniques has proved its ability to enhance the classification of the minority class. In this paper, we develop a new ensemble construction algorithm (EUSBoost) based on RUSBoost, one of the simplest and most accurate ensemble, which combines random undersampling with Boosting algorithm. Our methodology aims to improve the existing proposals enhancing the performance of the base classifiers by the usage of the evolutionary undersampling approach. Besides, we promote diversity favoring the usage of different subsets of majority class instances to train each base classifier. Centered on two-class highly imbalanced problems, we will prove, supported by the proper statistical analysis, that EUSBoost is able to outperform the state-of-the-art methods based on ensembles. We will also analyze its advantages using kappa-error diagrams, which we adapt to the imbalanced scenario. (C) 2013 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available