☆ 4.7 Article

Accelerating information entropy-based feature selection using rough set theory with classified nested equivalence classes

PATTERN RECOGNITION (2020)

Journal

PATTERN RECOGNITION

Volume 107, Issue -, Pages -

Publisher

ELSEVIER SCI LTD

DOI: 10.1016/j.patcog.2020.107517

Keywords

Feature selection; Rough set theory; Attribute reduction; Information entropy

Funding

National Natural Science Foundation of China [71871069, 71401045, 61976239, 71571052]
Ministry of Education in China Project of Humanities and Social Sciences [18YJAZH137]
Natural Science Foundation of Guangdong Province [2017A030313394, 2020A1515010783, 2016A030310300]
Major Scientific Research Projects of Guangdong [2017WTSCX021]
13th Five-Year Plan for the Development of Philosophy and Social Sciences of Guangzhou [2018GZGJ48]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Feature selection effectively reduces the dimensionality of data. For feature selection, rough set theory offers a systematic theoretical framework based on consistency measures, of which information entropy is one of the most important significance measures of attributes. However, an information-entropy-based significance measure is computationally expensive and requires repeated calculations. Although many accelerating strategies have been proposed thus far, there remains a bottleneck when using an information-entropy-based feature selection algorithm to handle large-scale datasets with high dimensions. In this study, we introduce a classified nested equivalence class (CNEC)-based approach to calculate the information-entropy-based significance for feature selection using rough set theory. The proposed method extracts knowledge of the reducts of a decision table to reduce the universe and construct CNECs. By exploring the properties of different types of CNECs, we can not only accelerate both outer and inner significance calculation by discarding useless CNECs but also effectively decrease the number of inner significance calculations by using one type of CNECs. The use of CNECs is shown to significantly enhance three representative entropy-based feature selection algorithms that use rough set theory. The feature subset selected by the CNEC-based algorithms is the same as that selected by algorithms using the original definition of information entropies. Experiments conducted using 31 datasets from multiple sources, such as the UCI repository and KDD Cup competition, including large-scale and high-dimensional datasets, confirm the efficiency and effectiveness of the proposed method. (C) 2020 Elsevier Ltd. All rights reserved.

Accelerating information entropy-based feature selection using rough set theory with classified nested equivalence classes

Journal

PATTERN RECOGNITION

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Accelerating information entropy-based feature selection using rough set theory with classified nested equivalence classes

Journal

PATTERN RECOGNITION

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper