☆ 4.5 Article

Fast mining of distance-based outliers in high-dimensional datasets

DATA MINING AND KNOWLEDGE DISCOVERY (2008)

期刊

DATA MINING AND KNOWLEDGE DISCOVERY

卷 16, 期 3, 页码 349-364

出版社

SPRINGER

DOI: 10.1007/s10618-008-0093-2

关键词

outlier detection; high-dimensional datasets; approximate k-nearest neighbors; clustering

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Defining outliers by their distance to neighboring data points has been shown to be an effective non-parametric approach to outlier detection. In recent years, many research efforts have looked at developing fast distance-based outlier detection algorithms. Several of the existing distance-based outlier detection algorithms report log-linear time performance as a function of the number of data points on many real low-dimensional datasets. However, these algorithms are unable to deliver the same level of performance on high-dimensional datasets, since their scaling behavior is exponential in the number of dimensions. In this paper, we present RBRP, a fast algorithm for mining distance-based outliers, particularly targeted at high-dimensional datasets. RBRP scales log-linearly as a function of the number of data points and linearly as a function of the number of dimensions. Our empirical evaluation demonstrates that we outperform the state-of-the-art algorithm, often by an order of magnitude.

Fast mining of distance-based outliers in high-dimensional datasets

期刊

DATA MINING AND KNOWLEDGE DISCOVERY

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Fast mining of distance-based outliers in high-dimensional datasets

期刊

DATA MINING AND KNOWLEDGE DISCOVERY

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文