4.7 Article

Spatial Distribution-Based Imbalanced Undersampling

Journal

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Volume 35, Issue 6, Pages 6376-6391

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2022.3161537

Keywords

Training; Costs; Biological neural networks; Machine learning algorithms; Clustering algorithms; Signal processing algorithms; Proposals; Undersampling; imbalance; local pattern; sphere neighborhood; ensemble learning

Ask authors/readers for more resources

This paper proposes Spatial Distribution-based UnderSampling (SDUS) for dealing with class-imbalance problems. SDUS maintains the distribution pattern of original data by learning majority-class local patterns and selecting samples using two strategies. Experimental results demonstrate the effectiveness of SDUS in maintaining the underlying distribution characteristics.
Undersampling is one of the most popular techniques for dealing with class-imbalance problems. Various undersampling methods have emerged over the past few decades. Each of them exhibits the superiority in some scenarios. However, selecting representative majority-class samples such that the structures of the selected groups are maintained according to the underlying imbalanced distribution remains a challenge. For this purpose, this paper proposes Spatial Distribution-based UnderSampling (SDUS) for imbalanced learning. SDUS uses a supervised constructive process to learn majority-class local patterns in terms of sphere neighborhoods (SPN). Two sample selection strategies, specifically, a top-down strategy and a bottom-up strategy, are proposed for maintaining the distribution pattern of original data in selecting majority-class sample subsets from different perspectives. SDUS introduces an ensemble technique that improves learning performance by utilizing the diversity caused by the randomness of the local-pattern learning process. Numerical experiments on 38 typical datasets from KEEL repository and 13 state-of-the-art comparison methods demonstrate the effectiveness of SDUS in maintaining the underlying distribution characteristics for imbalanced undersampling. The implementation of the proposed SDUS in programming language Python is available at https://github.com/ytyancp/SDUS.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available