4.7 Article

Clustering-Guided Particle Swarm Feature Selection Algorithm for High-Dimensional Imbalanced Data With Missing Values

Journal

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION
Volume 26, Issue 4, Pages 616-630

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TEVC.2021.3106975

Keywords

Class imbalance; feature selection (FS); fuzzy clustering; missing value; particle swarm optimization (PSO)

Funding

  1. National Natural Science Foundation of China [61876185, 61876184, 61973305]

Ask authors/readers for more resources

This article proposes a new evolutionary feature selection (FS) method to handle high-dimensional imbalanced data with missing values. By defining an improved RF-measure to evaluate the influence of missing data on FS performance and using it as an objective function, a particle swarm optimization-based FS method with fuzzy clustering (PSOFS-FC) is developed. Experimental results show that PSOFS-FC achieves excellent classification performance in a short amount of time, indicating its superiority in tackling high-dimensional imbalanced data with missing values.
Feature selection (FS) in data with class imbalance or missing values has received much attention from researchers due to their universality in real-world applications. However, for data with both the two characteristics above, there is still a lack of the corresponding FS algorithm. Due to the complex coupling relationship between missing data and class imbalance, the need for better FS method becomes essential. To tackle high-dimensional imbalanced data with missing values, this article studies a new evolutionary FS method. First, an improved F-measure based on filling risk (RF-measure) is defined to evaluate the influence of missing data on the performance of FS in the case of class imbalance. Following that taking the RF-measure as an objective function, a particle swarm optimization-based FS method with fuzzy clustering (PSOFS-FC) is proposed. Two new problem-specific operators or strategies, i.e., the swarm initialization strategy guided by fuzzy clustering and the local pruning operator based on feature importance, are developed to improve the performance of PSOFS-FC. Compared with state-of-the-art FS algorithms on several public datasets, experimental results show that PSOFS-FC can achieve excellent classification performance with relatively less running time, indicating its superiority on tackling high-dimensional imbalanced data with missing values.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available