☆ 4.6 Article

Feature selection with missing data using mutual information estimators

NEUROCOMPUTING (2012)

期刊

NEUROCOMPUTING

卷 90, 期 -, 页码 3-11

出版社

ELSEVIER

DOI: 10.1016/j.neucom.2012.02.031

关键词

Feature selection; Missing data; Mutual information

类别

Computer Science, Artificial Intelligence

资金

Belgian F.R.I.A.

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Feature selection is an important preprocessing task for many machine learning and pattern recognition applications, including regression and classification. Missing data are encountered in many real-world problems and have to be considered in practice. This paper addresses the problem of feature selection in prediction problems where some occurrences of features are missing. To this end, the well-known mutual information criterion is used. More precisely, it is shown how a recently introduced nearest neighbors based mutual information estimator can be extended to handle missing data. This estimator has the advantage over traditional ones that it does not directly estimate any probability density function. Consequently, the mutual information may be reliably estimated even when the dimension of the space increases. Results on artificial as well as real-world datasets indicate that the method is able to select important features without the need for any imputation algorithm, under the assumption of missing completely at random data. Moreover, experiments show that selecting the features before imputing the data generally increases the precision of the prediction models, in particular when the proportion of missing data is high. (C) 2012 Elsevier B.V. All rights reserved.

Feature selection with missing data using mutual information estimators

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Feature selection with missing data using mutual information estimators

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文