☆ 4.6 Article

Missing values: how many can they be to preserve classification reliability?

ARTIFICIAL INTELLIGENCE REVIEW (2013)

期刊

ARTIFICIAL INTELLIGENCE REVIEW

卷 40, 期 3, 页码 231-245

出版社

SPRINGER

DOI: 10.1007/s10462-011-9282-2

关键词

Medical data; Missing values; Distance measures; Imputation; Classification; Nearest neighbour searching

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Using five medical datasets we detected the influence of missing values on true positive rates and classification accuracy. We randomly marked more and more values as missing and tested their effects on classification accuracy. The classifications were performed with nearest neighbour searching when none, 10, 20, 30% or more values were missing. We also used discriminant analysis and na < ve Bayesian method for the classification. We discovered that for a two-class dataset, despite as high as 20-30% missing values, almost as good results as with no missing value could still be produced. If there are more than two classes, over 10-20% missing values are probably too many, at least for small classes with relatively few cases. The more classes and the more classes of different sizes, a classification task is the more sensitive to missing values. On the other hand, when values are missing on the basis of actual distributions affected by some selection or non-random cause and not fully random, classification can tolerate even high numbers of missing values for some datasets.

Missing values: how many can they be to preserve classification reliability?

期刊

ARTIFICIAL INTELLIGENCE REVIEW

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Missing values: how many can they be to preserve classification reliability?

期刊

ARTIFICIAL INTELLIGENCE REVIEW

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文