4.7 Article

A classifier ensemble approach for the missing feature problem

期刊

ARTIFICIAL INTELLIGENCE IN MEDICINE
卷 55, 期 1, 页码 37-50

出版社

ELSEVIER
DOI: 10.1016/j.artmed.2011.11.006

关键词

Missing values; Imputation methods; Support vector machine; Fuzzy clustering; Data corruption; Equipment malfunctions

向作者/读者索取更多资源

Objectives: Many classification problems must deal with data that contains missing values. In such cases data imputation is critical. This paper evaluates the performance of several statistical and machine learning imputation methods, including our novel multiple imputation ensemble approach, using different datasets. Materials and methods: Several state-of-the-art approaches are compared using different datasets. Some state-of-the-art classifiers (including support vector machines and input decimated ensembles) are tested with several imputation methods. The novel approach proposed in this work is a multiple imputation method based on random subspace, where each missing value is calculated considering a different cluster of the data. We have used a fuzzy clustering approach for the clustering algorithm. Results: Our experiments have shown that the proposed multiple imputation approach based on clustering and a random subspace classifier outperforms several other state-of-the-art approaches. Using the Wilcoxon signed-rank test (reject the null hypothesis, level of significance 0.05) we have shown that the proposed best approach is outperformed by the classifier trained using the original data (i.e., without missing values) only when >20% of the data are missed. Moreover, we have shown that coupling an imputation method with our cluster based imputation we outperform the base method (level of significance similar to 0.05). Conclusion: Starting from the assumptions that the feature set must be partially redundant and that the redundancy is distributed randomly over the feature set, we have proposed a method that works quite well even when a large percentage of the features is missing (>= 30%). Our best approach is available (MATLAB code) at bias.csr.unibo.it/nanni/MI.rar. (C) 2011 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据