4.5 Article

A reconstruction error-based framework for label noise detection

期刊

JOURNAL OF BIG DATA
卷 8, 期 1, 页码 -

出版社

SPRINGERNATURE
DOI: 10.1186/s40537-021-00447-5

关键词

Label noise; Autoencoder; PCA; ICA; Tomek links

资金

  1. National Science Foundation (NSF) [CNS-1427536]

向作者/读者索取更多资源

Label noise is a critical data quality issue that hinders machine learning algorithms, increasing model complexity and reducing interpretability. This study explores the use of unsupervised learners like PCA, ICA, and autoencoders to detect label noise, with autoencoders showing the best performance.
Label noise is an important data quality issue that negatively impacts machine learning algorithms. For example, label noise has been shown to increase the number of instances required to train effective predictive models. It has also been shown to increase model complexity and decrease model interpretability. In addition, label noise can cause the classification results of a learner to be poor. In this paper, we detect label noise with three unsupervised learners, namely principal component analysis (PCA), independent component analysis (ICA), and autoencoders. We evaluate these three learners on a credit card fraud dataset using multiple noise levels, and then compare results to the traditional Tomek links noise filter. Our binary classification approach, which considers label noise instances as anomalies, uniquely uses reconstruction errors for noisy data in order to identify and filter label noise. For detecting noisy instances, we discovered that the autoencoder algorithm was the top performer (highest recall score of 0.90), while Tomek links performed the worst (highest recall score of 0.62).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据