4.7 Article

Selection-fusion approach for classification of datasets with missing values

期刊

PATTERN RECOGNITION
卷 43, 期 6, 页码 2340-2350

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2009.12.003

关键词

Missing value management; Subspace classifiers; Ensemble classifiers; Multiple imputations; Pruning; Support vector machine (SVM)

资金

  1. NIH [R01-EB002450]
  2. Direct For Computer & Info Scie & Enginr
  3. Division Of Computer and Network Systems [0751045] Funding Source: National Science Foundation

向作者/读者索取更多资源

This paper proposes a new approach based on missing value pattern discovery for classifying incomplete data. This approach is particularly designed for classification of datasets with a small number of samples and a high percentage of missing values where available missing value treatment approaches do not usually work well. Based on the pattern of the missing values, the proposed approach finds subsets of samples for which most of the features are available and trains a classifier for each subset. Then, it combines the outputs of the classifiers. Subset selection is translated into a clustering problem, allowing derivation of a mathematical framework for it. A trade off is established between the computational complexity (number of subsets) and the accuracy of the overall classifier. To deal with this trade off, a numerical criterion is proposed for the prediction of the overall performance. The proposed method is applied to seven datasets from the popular University of California, Irvine data mining archive and an epilepsy dataset from Henry Ford Hospital, Detroit, Michigan (total of eight datasets). Experimental results show that classification accuracy of the proposed method is superior to those of the widely used multiple imputations method and four other methods. They also show that the level of superiority depends on the pattern and percentage of missing values. (C) 2009 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据