4.6 Article

ADDPC-SMOTE: An Oversampling Algorithm Based on Density Difference Peak Clustering and Spatial Distribution Entropy

Journal

IEEE ACCESS
Volume 11, Issue -, Pages 108152-108166

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2023.3320265

Keywords

Density difference peak clustering; spatial distribution entropy; oversampling algorithm; class overlap

Ask authors/readers for more resources

This paper proposes an oversampling algorithm based on Adaptive Density Difference Peak Clustering and Spatial Distribution Entropy, which takes into account the spatial distribution of the majority class and avoids class overlap when synthesizing new samples. Experimental results show that the algorithm significantly improves performance in various evaluation indexes.
Most of the existing oversampling algorithms based on clustering do not consider the spatial distribution of Majority class, and it is easy to overlap classes and ignore important information points when synthesizing new samples. To solve this problem, this paper analyzes the influence of the spatial distribution on the oversampling process, and proposes an oversampling algorithm based on Adaptive Density Difference Peak Clustering and Spatial Distribution Entropy. Firstly, the spatial distribution situation of two classes samples is introduced into the clustering process, and the local density difference is used to cluster of Minority class by the peak value, so as to achieve scientific and reasonable selection of sub-cluster centers and reduce the occurrence of class overlap. At the same time, the method of determining the truncation distance according to the previous experience is change. The spatial distribution situation of two classes samples is characterized by constructing Spatial Distribution Entropy. On this basis, the automatic selection and optimization of truncation distance are realized. Then the boundary points and sparse points are screened according to the absolute value of local density difference, and the sampling probabilities of each minority class sample are determined to focus on these important information points. Finally, Spatial Distribution Entropy is used to evaluate the synthetic samples set to ensure that they can balance the distribution of the two classes samples in the dataset. To test the effectiveness of the algorithm, five oversampling algorithms are used to perform comparative experiments on four classifiers and 16 common datasets. The results show that compared with SMOTE, K-means-SMOTE, BS-SMOTE, ADASYN, DPC-SMOTE, the algorithm has significantly improved in all evaluation indexes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available