4.8 Article

CGUFS: A clustering-guided unsupervised feature selection algorithm for gene expression data

Publisher

ELSEVIER
DOI: 10.1016/j.jksuci.2023.101731

Keywords

Gene expression data; Clustering-guided; Unsupervised feature selection; k-means; Spectral clustering

Ask authors/readers for more resources

This paper proposes a clustering-guided unsupervised feature selection algorithm for gene expression data, which addresses the problems of existing algorithms such as the need for artificially specifying the number of clusters, failure to consider feature redundancy, and inability to filter redundant features. The proposed algorithm introduces adaptive k-value strategy, feature grouping strategy, and adaptive filtering strategy to select significant features related to diseases. Experimental results demonstrate that the algorithm outperforms existing algorithms in terms of accuracy and correlation indexes.
(Aim) Gene expression data is typically high dimensional with a limited number of samples and contain many features that are unrelated to the disease of interest. Existing unsupervised feature selection algo-rithms primarily focus on the significance of features in maintaining the data structure while not taking into account the redundancy among features. Determining the appropriate number of significant features is another challenge. (Method) In this paper, we propose a clustering-guided unsupervised feature selec-tion (CGUFS) algorithm for gene expression data that addresses these problems. Our proposed algorithm introduces three improvements over existing algorithms. For the problem that existing clustering algo-rithms require artificially specifying the number of clusters, we propose an adaptive k-value strategy to assign appropriate pseudo-labels to each sample by iteratively updating a change function. For the problem that existing algorithms fail to consider the redundancy among features, we propose a feature grouping strategy to group highly redundant features. For the problem that the existing algorithms can-not filter the redundant features, we propose an adaptive filtering strategy to determine the feature com-binations to be retained by calculating the potentially effective features and potentially redundant features of each feature group. (Result) Experimental results show that the average accuracy (ACC) and matthews correlation coefficient (MCC) indexes of the C4.5 classifier on the optimal features selected by the CGUFS algorithm reach 74.37% and 63.84%, respectively, significantly superior to the existing algo-rithms. (Conclusion) Similarly, the average ACC and MCC indexes of the Adaboost classifier on the optimal features selected by the CGUFS algorithm are significantly superior to the existing algorithms. In addition, statistical experiment results show significant differences between the CGUFS algorithm and the existing algorithms.(c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available