☆ 4.7 Article

Clustering-based feature subset selection with analysis on the redundancy-complementarity dimension

COMPUTER COMMUNICATIONS (2021)

期刊

COMPUTER COMMUNICATIONS

卷 168, 期 -, 页码 65-74

出版社

ELSEVIER

DOI: 10.1016/j.comcom.2021.01.005

关键词

Feature selection; Redundancy; Complementarity; Clustering; Minimum spanning tree

类别

Computer Science, Information Systems Engineering, Electrical & Electronic Telecommunications

资金

National Natural Science Foundation of China [U1764262, 52072288, 71702066]
Fundamental Research Funds for the Central Universities [WUT: 2020IVA007]
Hubei Province Innovation Group Project [2017CFA008]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this paper, a novel feature subset selection algorithm called CFSRCA is proposed, which effectively selects candidate class-relevant features and representative features, and the experimental results validate its effectiveness.

In the era of big data, dimensionality reduction plays an extremely important role in many fields driven by machine learning and data mining techniques. The existing information-theoretic feature selection algorithms generally reduce the dimension by selecting the features with maximum class-relevance and minimum redundancy, while relatively overlook the complementary correlation among features and sometimes deal with it improperly. This paper proposes a novel feature subset selection algorithm called the Clustering-based Feature Selection with Redundancy-Complementarity Analysis (CFSRCA). The proposed algorithm can be mainly divided into two steps, namely, (a) selecting the candidate class-relevant features, and (b) selecting the representative features. In the latter step, the representative features are defined as the features with minimum redundancy and maximum complementarity, and a clustering method based on the minimum spanning tree (MST) is proposed to distinguish them effectively. To validate the effectiveness of CFSRCA, three comparative feature selection algorithms (ReliefF, CFS, and FOU) and four well-known classifiers (C4.5, SVM, kNN, and NBC) are used to conduct classification experiments on eight datasets. Experimental results verify the effectiveness of the proposed feature subset algorithm.

Clustering-based feature subset selection with analysis on the redundancy-complementarity dimension

期刊

COMPUTER COMMUNICATIONS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Clustering-based feature subset selection with analysis on the redundancy-complementarity dimension

期刊

COMPUTER COMMUNICATIONS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文