☆ 4.6 Article

A study of the effect of different types of noise on the precision of supervised learning techniques

ARTIFICIAL INTELLIGENCE REVIEW (2010)

期刊

ARTIFICIAL INTELLIGENCE REVIEW

卷 33, 期 4, 页码 275-306

出版社

SPRINGER

DOI: 10.1007/s10462-010-9156-z

关键词

Attribute noise; Class noise; Machine learning techniques; Noise impacts

类别

Computer Science, Artificial Intelligence

资金

Ministerio de Educacion y Ciencia [TIN2008-06681-C06-05]
Generalitat de Catalunya [2009SGR-00183]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Machine learning techniques often have to deal with noisy data, which may affect the accuracy of the resulting data models. Therefore, effectively dealing with noise is a key aspect in supervised learning to obtain reliable models from data. Although several authors have studied the effect of noise for some particular learners, comparisons of its effect among different learners are lacking. In this paper, we address this issue by systematically comparing how different degrees of noise affect four supervised learners that belong to different paradigms. Specifically, we consider the Na < ve Bayes probabilistic classifier, the C4.5 decision tree, the IBk instance-based learner and the SMO support vector machine. We have selected four methods which enable us to contrast different learning paradigms, and which are considered to be four of the top ten algorithms in data mining (Yu et al. 2007). We test them on a collection of data sets that are perturbed with noise in the input attributes and noise in the output class. As an initial hypothesis, we assign the techniques to two groups, NB with C4.5 and IBk with SMO, based on their proposed sensitivity to noise, the first group being the least sensitive. The analysis enables us to extract key observations about the effect of different types and degrees of noise on these learning techniques. In general, we find that Na < ve Bayes appears as the most robust algorithm, and SMO the least, relative to the other two techniques. However, we find that the underlying empirical behavior of the techniques is more complex, and varies depending on the noise type and the specific data set being processed. In general, noise in the training data set is found to give the most difficulty to the learners.

A study of the effect of different types of noise on the precision of supervised learning techniques

期刊

ARTIFICIAL INTELLIGENCE REVIEW

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A study of the effect of different types of noise on the precision of supervised learning techniques

期刊

ARTIFICIAL INTELLIGENCE REVIEW

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文