4.7 Article

KNN-BLOCK DBSCAN: Fast Clustering for Large-Scale Data

期刊

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSMC.2019.2956527

关键词

DBSCAN; FLANN; kNN; KNN-BLOCK DBSCAN

资金

  1. National Natural Science Foundation of China [61673186, 61972010, 61975124, 61722205, 61751205, 61572199, U1611461]
  2. State Key Laboratory of Computer Architecture, ICT, CAS [CARCH201807]
  3. Open Project of Provincial Key Laboratory for Computer Information Processing Technology, Soochow University [KJS1839]
  4. Quanzhou City Science and Technology Program of China [2018C114R]
  5. Open Project of Beijing Key Laboratory of Big Data Technology for Food Safety [BTBD-2019KF06]
  6. Key Research and Development Program of Guang Dong Province [2018B010107002]
  7. Guang Dong Natural Science Funds [2017A030312008]

向作者/读者索取更多资源

This article proposes a fast approximate DBSCAN algorithm, KNN-BLOCK DBSCAN, based on kNN problem, which detects core-blocks, noncore-blocks, and noise-blocks, and merges CBs while assigning noncore points to proper clusters for efficient data clustering.
Large-scale data clustering is an essential key for big data problem. However, no current existing approach is optimal for big data due to high complexity, which remains it a great challenge. In this article, a simple but fast approximate DBSCAN, namely, KNN-BLOCK DBSCAN, is proposed based on two findings: 1) the problem of identifying whether a point is a core point or not is, in fact, a kNN problem and 2) a point has a similar density distribution to its neighbors, and neighbor points are highly possible to be the same type (core point, border point, or noise). KNN-BLOCK DBSCAN uses a fast approximate kNN algorithm, namely, FLANN, to detect core-blocks (CBs), noncore-blocks, and noise-blocks within which all points have the same type, then a fast algorithm for merging CBs and assigning noncore points to proper clusters is also invented to speedup the clustering process. The experimental results show that KNN-BLOCK DBSCAN is an effective approximate DBSCAN algorithm with high accuracy, and outperforms other current variants of DBSCAN, including rho-approximate DBSCAN and AnyDBC.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据