4.6 Article

A Fast Hybrid Feature Selection Based on Correlation-Guided Clustering and Particle Swarm Optimization for High-Dimensional Data

Journal

IEEE TRANSACTIONS ON CYBERNETICS
Volume 52, Issue 9, Pages 9573-9586

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCYB.2021.3061152

Keywords

Clustering algorithms; Computational efficiency; Particle swarm optimization; Feature extraction; Search problems; Convergence; Mutual information; Clustering; feature selection (FS); hybrid search; particle swarm optimization (PSO)

Funding

  1. National Natural Science Foundation of China [61876185, 51875113]
  2. Scientic Innovation 2030 Major Project for New Generation of AI, Ministry of Science and Technology of the People's Republic of China [2020AAA0107300]

Ask authors/readers for more resources

This study proposes a new three-phase hybrid feature selection algorithm that effectively integrates three different feature selection methods to address the "curse of dimensionality" and high computational cost in high-dimensional feature selection problems. Experimental results demonstrate that the algorithm performs well in obtaining good feature subsets with the lowest computational cost on 18 real-world datasets.
The ``curse of dimensionality'' and the high computational cost have still limited the application of the evolutionary algorithm in high-dimensional feature selection (FS) problems. This article proposes a new three-phase hybrid FS algorithm based on correlation-guided clustering and particle swarm optimization (PSO) (HFS-C-P) to tackle the above two problems at the same time. To this end, three kinds of FS methods are effectively integrated into the proposed algorithm based on their respective advantages. In the first and second phases, a filter FS method and a feature clustering-based method with low computational cost are designed to reduce the search space used by the third phase. After that, the third phase applies oneself to finding an optimal feature subset by using an evolutionary algorithm with the global searchability. Moreover, a symmetric uncertainty-based feature deletion method, a fast correlation-guided feature clustering strategy, and an improved integer PSO are developed to improve the performance of the three phases, respectively. Finally, the proposed algorithm is validated on 18 publicly available real-world datasets in comparison with nine FS algorithms. Experimental results show that the proposed algorithm can obtain a good feature subset with the lowest computational cost.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available