☆ 4.7 Article

Under-sampling class imbalanced datasets by combining clustering analysis and instance selection

INFORMATION SCIENCES (2019)

Journal

INFORMATION SCIENCES

Volume 477, Issue -, Pages 47-54

Publisher

ELSEVIER SCIENCE INC

DOI: 10.1016/j.ins.2018.10.029

Keywords

Data mining; Class imbalance; Clustering; Ensemble classifiers; Instance selection

Funding

Ministry of Science and Technology of Taiwan [MOST 106-2410-H-182-024]
Featured Areas Research Center Program within the Ministry of Education (MOE) of Taiwan of the Healthy Aging Research Center, Chang Gung University [EMRPD1H0421, EMRPD1H0551]
Chang Gung Memorial Hospital, Linkou [NERPD2G0301T]
Center for Innovative Research on Aging Society from The Featured Areas Research Center Program within the Ministry of Education (MOE) in Taiwan

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Class-imbalanced datasets, i.e., those with the number of data samples in one class being much larger than that in another class, occur in many real-world problems. Using these datasets, it is very difficult to construct effective classifiers based on the current classification algorithms, especially for distinguishing small or minority classes from the majority class. To solve the class imbalance problem, the under/oversampling techniques have been widely used to reduce and enlarge the numbers of data samples in the majority and minority classes, respectively. Moreover, the combinations of certain sampling approaches with ensemble classifiers have shown reasonably good performance. In this paper, a novel undersampling approach called cluster-based instance selection (CBIS) that combines clustering analysis and instance selection is introduced. The clustering analysis component groups similar data samples of the majority class dataset into 'subclasses', while the instance selection component filters out unrepresentative data samples from each of the 'subclasses'. The experimental results based on the KEEL dataset repository show that the CBIS approach can make bagging and boosting-based MLP ensemble classifiers perform significantly better than six state-of-the-art approaches, regardless of what kinds of clustering (affinity propagation and k-means) and instance selection (IB3, DROP3 and GA) algorithms are used. (C) 2018 Elsevier Inc. All rights reserved.

Under-sampling class imbalanced datasets by combining clustering analysis and instance selection

Journal

INFORMATION SCIENCES

Publisher

ELSEVIER SCIENCE INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Under-sampling class imbalanced datasets by combining clustering analysis and instance selection

Journal

INFORMATION SCIENCES

Publisher

ELSEVIER SCIENCE INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper