4.7 Article

Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification

期刊

INFORMATION SCIENCES
卷 502, 期 -, 页码 18-41

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2019.05.072

关键词

Granular computing; Neighborhood rough sets; Feature selection; Neighborhood entropy; Uncertainty measure; Cancer classification

资金

  1. National Natural Science Foundation of China [61772176, 61402153, 61370169, 61672332]
  2. China Postdoctoral Science Foundation [2016M602247]
  3. Plan of Scientific Innovation Talent of Henan Province [184100510003]
  4. Key Scientific and Technological Project of Henan Province [182102210362]
  5. Young Scholar Program of Henan Province [2017GGJS041]
  6. Natural Science Foundation of Henan Province [182300410130]

向作者/读者索取更多资源

Gene expression data classification is an important technology for cancer diagnosis in bioinformatics and has been widely researched. Due to the large number of genes and the small sample size in gene expression data, feature selection based on neighborhood rough sets is a key step for improving the performance of gene expression data classification. However, some quantitative measures of feature sets may be nonmonotonic in neighborhood rough sets, and many feature selection methods based on evaluation functions yield high cardinality and low predictive accuracy. Therefore, investigating effective and efficient heuristic reduction algorithms is necessary. In this paper, a novel feature selection method based on neighborhood rough sets using neighborhood entropy-based uncertainty measures for cancer classification from gene expression data is proposed. First, some neighborhood entropy-based uncertainty measures are investigated for handling the uncertainty and noise of neighborhood decision systems. Then, to fully reflect the decision making ability of attributes, the neighborhood credibility and neighborhood coverage degrees are defined and introduced into decision neighborhood entropy and mutual information, which are proven to be nonmonotonic. Moreover, some of the properties and relationships among these measures are derived, which is helpful for understanding the essence of the knowledge content and the uncertainty of neighborhood decision systems. Finally, the Fisher score method is employed to preliminarily eliminate irrelevant genes to significantly reduce complexity, and a heuristic feature selection algorithm with low computational complexity is presented to improve the performance of cancer classification using gene expression data. Experiments on ten gene expression datasets show that our proposed algorithm is indeed efficient and outperforms other related methods in terms of the number of selected genes and the classification accuracy, especially as the size of the genes increases. 2019 Published by Elsevier Inc.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据