4.7 Article

SDCOR: Scalable density-based clustering for local outlier detection in massive-scale datasets

期刊

KNOWLEDGE-BASED SYSTEMS
卷 228, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.knosys.2021.107256

关键词

Local outlier detection; Massive-scale datasets; Scalable; Density-based clustering; Anomaly detection

向作者/读者索取更多资源

This paper presents a batch-wise density-based clustering approach for local outlier detection in massive-scale datasets. The method is scalable and processes input data chunk-by-chunk within a limited memory buffer, updating a temporary clustering model gradually to obtain the approximate structure of original clusters and assigning an outlying score to each object. Evaluation shows the proposed method has low linear time complexity compared to conventional methods loading all data into memory and fast distance-based methods operating on disk-resident data.
This paper presents a batch-wise density-based clustering approach for local outlier detection in massive-scale datasets. Unlike the well-known traditional algorithms, which assume that all the data is memory-resident, our proposed method is scalable and processes the input data chunk-by-chunk within the confines of a limited memory buffer. A temporary clustering model is built at the first phase; then, it is gradually updated by analyzing consecutive memory loads of points. Subsequently, at the end of scalable clustering, the approximate structure of the original clusters is obtained. Finally, by another scan of the entire dataset and using a suitable criterion, an outlying score is assigned to each object called SDCOR (Scalable Density-based Clustering Outlierness Ratio). Evaluations on real-life and synthetic datasets demonstrate that the proposed method has a low linear time complexity and is more effective and efficient compared to best-known conventional density-based methods, which need to load all data into the memory; and also, to some fast distance-based methods, which can perform on data resident in the disk. (C) 2021 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据