☆ 4.7 Article

Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2015)

期刊

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

卷 27, 期 5, 页码 1369-1382

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TKDE.2014.2365790

关键词

Outlier detection; reverse nearest neighbors; high-dimensional data; distance concentration

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems Engineering, Electrical & Electronic

资金

Ministry of Education, Science and Technological Development of the Republic of Serbia [OI174023]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Outlier detection in high-dimensional data presents various challenges resulting from the curse of dimensionality. A prevailing view is that distance concentration, i.e., the tendency of distances in high-dimensional data to become indiscernible, hinders the detection of outliers by making distance-based methods label all points as almost equally good outliers. In this paper, we provide evidence supporting the opinion that such a view is too simple, by demonstrating that distance-based methods can produce more contrasting outlier scores in high-dimensional settings. Furthermore, we show that high dimensionality can have a different impact, by reexamining the notion of reverse nearest neighbors in the unsupervised outlier-detection context. Namely, it was recently observed that the distribution of points' reverse-neighbor counts becomes skewed in high dimensions, resulting in the phenomenon known as hubness. We provide insight into how some points (antihubs) appear very infrequently in k-NN lists of other points, and explain the connection between antihubs, outliers, and existing unsupervised outlier-detection methods. By evaluating the classic k-NN method, the angle-based technique designed for high-dimensional data, the density-based local outlier factor and influenced outlierness methods, and antihub-based methods on various synthetic and real-world data sets, we offer novel insight into the usefulness of reverse neighbor counts in unsupervised outlier detection.

Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection

期刊

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection

期刊

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文