4.7 Article

Minority-prediction-probability-based oversampling technique for imbalanced learning

Journal

INFORMATION SCIENCES
Volume 622, Issue -, Pages 1273-1295

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2022.11.148

Keywords

Imbalanced learning; Oversampling technique; Prediction probability; Selection probability; Sample generation

Ask authors/readers for more resources

In this study, a new oversampling method called MPP-SMOTE is proposed to address the issue of imbalanced learning. The method removes noisy samples and divides the minority samples into two types based on their probability of belonging to the minority class. It then separately selects and generates synthetic samples for each type based on different sample-generation schemes. Experimental results demonstrate that MPP-SMOTE outperforms other oversampling methods in terms of imbalanced-learning metrics for common classifiers.
In this study, we propose an oversampling method called the minority-predictive-probabil ity-based synthetic minority oversampling technique (MPP-SMOTE) for imbalanced learn-ing. First, MPP-SMOTE removes noisy samples from minority classes. Subsequently, it divides minority samples into two types (hard-to-learn and easy-to-learn) by predicting the probability of samples belonging to the minority class. For both sample types, we adopt a divide-and-conquer strategy. We separately calculate the probability of each sample being selected to generate a new synthetic sample. The relative density of a sample in both the majority and minority classes is considered in the method for calculating the selection probability of hard-to-learn samples, and the relative density of a sample in only the minority class is considered in that of easy-to-learn samples. Finally, according to the types and selection probabilities, MPP-SMOTE separately selects samples and generates syn-thetic samples based on them by using different sample-generation schemes. Experimental results reveal that the proposed method outperforms other oversampling methods in terms of three imbalanced-learning metrics for three common classifiers. According to the results, when a support vector machine classifier is applied, the area under the curve performance of the MPP-SMOTE improves by a factor of 1.44%.(c) 2022 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available