Journal
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION
Volume 26, Issue 4, Pages 616-630Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TEVC.2021.3106975
Keywords
Class imbalance; feature selection (FS); fuzzy clustering; missing value; particle swarm optimization (PSO)
Funding
- National Natural Science Foundation of China [61876185, 61876184, 61973305]
Ask authors/readers for more resources
This article proposes a new evolutionary feature selection (FS) method to handle high-dimensional imbalanced data with missing values. By defining an improved RF-measure to evaluate the influence of missing data on FS performance and using it as an objective function, a particle swarm optimization-based FS method with fuzzy clustering (PSOFS-FC) is developed. Experimental results show that PSOFS-FC achieves excellent classification performance in a short amount of time, indicating its superiority in tackling high-dimensional imbalanced data with missing values.
Feature selection (FS) in data with class imbalance or missing values has received much attention from researchers due to their universality in real-world applications. However, for data with both the two characteristics above, there is still a lack of the corresponding FS algorithm. Due to the complex coupling relationship between missing data and class imbalance, the need for better FS method becomes essential. To tackle high-dimensional imbalanced data with missing values, this article studies a new evolutionary FS method. First, an improved F-measure based on filling risk (RF-measure) is defined to evaluate the influence of missing data on the performance of FS in the case of class imbalance. Following that taking the RF-measure as an objective function, a particle swarm optimization-based FS method with fuzzy clustering (PSOFS-FC) is proposed. Two new problem-specific operators or strategies, i.e., the swarm initialization strategy guided by fuzzy clustering and the local pruning operator based on feature importance, are developed to improve the performance of PSOFS-FC. Compared with state-of-the-art FS algorithms on several public datasets, experimental results show that PSOFS-FC can achieve excellent classification performance with relatively less running time, indicating its superiority on tackling high-dimensional imbalanced data with missing values.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available