4.7 Article

Accelerating information entropy-based feature selection using rough set theory with classified nested equivalence classes

Journal

PATTERN RECOGNITION
Volume 107, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2020.107517

Keywords

Feature selection; Rough set theory; Attribute reduction; Information entropy

Funding

  1. National Natural Science Foundation of China [71871069, 71401045, 61976239, 71571052]
  2. Ministry of Education in China Project of Humanities and Social Sciences [18YJAZH137]
  3. Natural Science Foundation of Guangdong Province [2017A030313394, 2020A1515010783, 2016A030310300]
  4. Major Scientific Research Projects of Guangdong [2017WTSCX021]
  5. 13th Five-Year Plan for the Development of Philosophy and Social Sciences of Guangzhou [2018GZGJ48]

Ask authors/readers for more resources

Feature selection effectively reduces the dimensionality of data. For feature selection, rough set theory offers a systematic theoretical framework based on consistency measures, of which information entropy is one of the most important significance measures of attributes. However, an information-entropy-based significance measure is computationally expensive and requires repeated calculations. Although many accelerating strategies have been proposed thus far, there remains a bottleneck when using an information-entropy-based feature selection algorithm to handle large-scale datasets with high dimensions. In this study, we introduce a classified nested equivalence class (CNEC)-based approach to calculate the information-entropy-based significance for feature selection using rough set theory. The proposed method extracts knowledge of the reducts of a decision table to reduce the universe and construct CNECs. By exploring the properties of different types of CNECs, we can not only accelerate both outer and inner significance calculation by discarding useless CNECs but also effectively decrease the number of inner significance calculations by using one type of CNECs. The use of CNECs is shown to significantly enhance three representative entropy-based feature selection algorithms that use rough set theory. The feature subset selected by the CNEC-based algorithms is the same as that selected by algorithms using the original definition of information entropies. Experiments conducted using 31 datasets from multiple sources, such as the UCI repository and KDD Cup competition, including large-scale and high-dimensional datasets, confirm the efficiency and effectiveness of the proposed method. (C) 2020 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available