4.6 Article

An Ensemble Learning Algorithm Based on Density Peaks Clustering and Fitness for Imbalanced Data

Journal

IEEE ACCESS
Volume 10, Issue -, Pages 116120-116128

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2022.3219582

Keywords

Classification algorithms; Clustering algorithms; Probability; Ensemble learning; Partitioning algorithms; Machine learning algorithms; Decision trees; Imbalanced data; density peaks clustering; fitness; under-sampling; classification

Funding

  1. National Natural Science Foundation of China [62172351]

Ask authors/readers for more resources

This paper proposes an algorithm based on density peaks clustering and fitness to address the low classification accuracy of the minority class in imbalanced data. Experimental results show that the algorithm outperforms other algorithms.
In view of the low classification accuracy of the minority class in imbalanced data, an algorithm called DPF-EL (density peaks and fitness combined with ensemble learning) based on density peaks clustering and fitness is proposed. Firstly, this method uses the density peaks clustering algorithm to divide the majority class into different sub-clusters, the local density calculated in the clustering process is used to assign weights to each sub-cluster, and the number of under-sampling is determined by the weights. Secondly, the concept of fitness is introduced into the sub-clusters, the selection probability of the samples is calculated according to the size of their fitness, and the majority class is under-sampled based on the selection probability. Finally, combined with boosting algorithm, iterative training is performed on the balanced data set. Experimental tests were conducted with KEEL imbalanced data sets, and the experimental results show that the performance of DPF-EL algorithm is better than other algorithms, which indicates the feasibility of the proposed algorithm.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available