☆ 4.7 Article

Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification

INFORMATION SCIENCES (2019)

期刊

INFORMATION SCIENCES

卷 502, 期 -, 页码 18-41

出版社

ELSEVIER SCIENCE INC

DOI: 10.1016/j.ins.2019.05.072

关键词

Granular computing; Neighborhood rough sets; Feature selection; Neighborhood entropy; Uncertainty measure; Cancer classification

类别

Computer Science, Information Systems

资金

National Natural Science Foundation of China [61772176, 61402153, 61370169, 61672332]
China Postdoctoral Science Foundation [2016M602247]
Plan of Scientific Innovation Talent of Henan Province [184100510003]
Key Scientific and Technological Project of Henan Province [182102210362]
Young Scholar Program of Henan Province [2017GGJS041]
Natural Science Foundation of Henan Province [182300410130]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Gene expression data classification is an important technology for cancer diagnosis in bioinformatics and has been widely researched. Due to the large number of genes and the small sample size in gene expression data, feature selection based on neighborhood rough sets is a key step for improving the performance of gene expression data classification. However, some quantitative measures of feature sets may be nonmonotonic in neighborhood rough sets, and many feature selection methods based on evaluation functions yield high cardinality and low predictive accuracy. Therefore, investigating effective and efficient heuristic reduction algorithms is necessary. In this paper, a novel feature selection method based on neighborhood rough sets using neighborhood entropy-based uncertainty measures for cancer classification from gene expression data is proposed. First, some neighborhood entropy-based uncertainty measures are investigated for handling the uncertainty and noise of neighborhood decision systems. Then, to fully reflect the decision making ability of attributes, the neighborhood credibility and neighborhood coverage degrees are defined and introduced into decision neighborhood entropy and mutual information, which are proven to be nonmonotonic. Moreover, some of the properties and relationships among these measures are derived, which is helpful for understanding the essence of the knowledge content and the uncertainty of neighborhood decision systems. Finally, the Fisher score method is employed to preliminarily eliminate irrelevant genes to significantly reduce complexity, and a heuristic feature selection algorithm with low computational complexity is presented to improve the performance of cancer classification using gene expression data. Experiments on ten gene expression datasets show that our proposed algorithm is indeed efficient and outperforms other related methods in terms of the number of selected genes and the classification accuracy, especially as the size of the genes increases. 2019 Published by Elsevier Inc.

Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification

期刊

INFORMATION SCIENCES

出版社

ELSEVIER SCIENCE INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification

期刊

INFORMATION SCIENCES

出版社

ELSEVIER SCIENCE INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文