☆ 4.7 Article

The optimal combination of feature selection and data discretization: An empirical study

INFORMATION SCIENCES (2019)

期刊

INFORMATION SCIENCES

卷 505, 期 -, 页码 282-293

出版社

ELSEVIER SCIENCE INC

DOI: 10.1016/j.ins.2019.07.091

关键词

Data mining; Discretization; Feature selection; Machine learning

类别

Computer Science, Information Systems

资金

Ministry of Science and Technology of Taiwan [MOST 108-2410-H-008-063-MY3]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Feature selection and data discretization are two important data pre-processing steps in data mining, with the focus in the former being on filtering out unrepresentative features and in the latter on transferring continuous attributes into discrete ones. In the literature, these two domain problems have often been studied, individually. However, the combination of these two steps has not been fully explored, although both feature selection and discretization may be required for some real-world datasets. In this paper, two different combination orders of feature selection and discretization are examined in terms of their classification accuracies and computational times. Specifically, filter, wrapper, and embedded feature selection methods are employed, which are PCA, GA, and C4.5, respectively. For discretization, both supervised and unsupervised learning based discretizers are used, specifically MDLP, ChiMerge, equal frequency binning, and equal width binning. The experimental results, based on 10 UCI datasets, show that, for the SVM classifier performing MDLP first and C4.5 second outperforms the other combinations. Not only is less computational time required but this also provides the highest rate of classification accuracy. For the decision tree classifier, performing C4.5 first and MDLP second is recommended. (C) 2019 Elsevier Inc. All rights reserved.

The optimal combination of feature selection and data discretization: An empirical study

期刊

INFORMATION SCIENCES

出版社

ELSEVIER SCIENCE INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

The optimal combination of feature selection and data discretization: An empirical study

期刊

INFORMATION SCIENCES

出版社

ELSEVIER SCIENCE INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文