☆ 4.7 Article

EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling

PATTERN RECOGNITION (2013)

Journal

PATTERN RECOGNITION

Volume 46, Issue 12, Pages 3460-3471

Publisher

ELSEVIER SCI LTD

DOI: 10.1016/j.patcog.2013.05.006

Keywords

Classification; Imbalanced data-sets; Ensembles; Class distribution; Kappa-error diagrams; Boosting

Funding

Spanish Ministry of Education and Science [TIN2010-15055, TIN2011-28488]
Andalusian Research Plan [P10-TIC-6858]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Classification with imbalanced data-sets has become one of the most challenging problems in Data Mining. Being one class much more represented than the other produces undesirable effects in both the learning and classification processes, mainly regarding the minority class. Such a problem needs accurate tools to be undertaken; lately, ensembles of classifiers have emerged as a possible solution. Among ensemble proposals, the combination of Bagging and Boosting with preprocessing techniques has proved its ability to enhance the classification of the minority class. In this paper, we develop a new ensemble construction algorithm (EUSBoost) based on RUSBoost, one of the simplest and most accurate ensemble, which combines random undersampling with Boosting algorithm. Our methodology aims to improve the existing proposals enhancing the performance of the base classifiers by the usage of the evolutionary undersampling approach. Besides, we promote diversity favoring the usage of different subsets of majority class instances to train each base classifier. Centered on two-class highly imbalanced problems, we will prove, supported by the proper statistical analysis, that EUSBoost is able to outperform the state-of-the-art methods based on ensembles. We will also analyze its advantages using kappa-error diagrams, which we adapt to the imbalanced scenario. (C) 2013 Elsevier Ltd. All rights reserved.

EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling

Journal

PATTERN RECOGNITION

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling

Journal

PATTERN RECOGNITION

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper