4.7 Article

BLOCK-DBSCAN: Fast clustering for large scale data

期刊

PATTERN RECOGNITION
卷 109, 期 -, 页码 -

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2020.107624

关键词

DBSCAN; rho-approximate DBSCAN; BLOCK-DBSCAN; Core block

资金

  1. National Science Foundation of China [61673186, 71771094, 61972010]
  2. Project of science and technology plan of Fujian Province of China [2017H01010065]
  3. Quanzhou City Science & Technology Program of China [2018Z0088, 2018C114R]

向作者/读者索取更多资源

The article analyzes the drawbacks of DBSCAN and its variants, proposing two techniques for improvement: xi-norm ball and fast approximate algorithm. Additionally, cover tree is used to accelerate density computations, and a method called BLOCK-DBSCAN is introduced for large-scale data.
We analyze the drawbacks of DBSCAN and its variants, and find the grid technique, which is used in Fast-DBSCAN and rho-approximate DBSCAN, is almost useless in high dimensional data space. Because it usually yields considerable redundant distance computations. In order to tame these problems, two techniques are proposed: one is to use xi-norm ball to identify Inner Core Blocks within which all points are core points, it has higher efficiency than grid technique for finding more core points at one time; the other is a fast approximate algorithm for judging whether two Inner Core Blocks are density-reachable from each other. Besides, cover tree is also used to accelerate the process of density computations. Based on the three techniques, an approximate approach, namely BLOCK-DBSCAN, is proposed for large scale data, which runs in about O(nlog (n)) expected time and obtains almost the same result as DBSCAN. BLOCK-DBSCAN has two versions, i.e., L-2 version can work well for relatively high dimensional data, and L-infinity version is suitable for high dimensional data. Experimental results show that BLOCK-DBSCAN is promising and outperforms NQDBSCAN, rho-approximate DBSCAN and AnyDBC. (C) 2020 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据