4.7 Article

Sparse projection infinite selection ensemble for imbalanced classification

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 262, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2022.110246

Keywords

Imbalanced classification; Graph -based methods; Random projections; Ensemble learning

Ask authors/readers for more resources

Imbalanced datasets pose frequent and challenging problems in real-world applications, where classification models tend to be biased towards the majority class. This paper proposes a novel framework called SPISE, which addresses the imbalanced learning problem by iteratively resampling balanced subsets and combining classifiers trained on these subsets. It takes into account the diversity of classifier ensembles and the similarity between subsets and the whole dataset.
Imbalanced datasets pose frequent and challenging problems to many real-world applications. Clas-sification models are often biased towards the majority class when learning from class-imbalanced data. Typical imbalanced learning (IL) approaches, e.g., SMOTE, AdaCost, and Cascade, often suffer from poor performance in complex tasks where class overlapping or a high imbalance ratio occurs. In this paper, we systematically investigate the IL problem and propose a novel framework named sparse projection infinite selection ensemble (SPISE). SPISE iteratively resamples balanced subsets and combines the classifiers trained on these subsets for imbalanced classification. The diversity of classifier ensembles and the similarity between the subsets and the whole dataset are considered in this process. Specifically, we present a graph-based approach named infinite subset selection to adaptively sample diverse and similar subsets. Additionally, a random sparse projection is combined with feature selection at the beginning of each iteration to augment the training features and enhance the diversity of the generated subsets. SPISE can be easily adapted to most existing classifiers (e.g., support vector machine and random forest) to boost their performance for IL. Quantitative experiments on 26 imbalanced benchmark datasets substantiate the effectiveness and superiority of the proposed model compared with other popular approaches.(c) 2023 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available