4.7 Article

Research on the Parallelization of the DBSCAN Clustering Algorithm for Spatial Data Mining Based on the Spark Platform

期刊

REMOTE SENSING
卷 9, 期 12, 页码 -

出版社

MDPI
DOI: 10.3390/rs9121301

关键词

spatial data mining; DBSCAN algorithm; parallel computing; spark platform; traffic congestion area discovery

资金

  1. Key Laboratory of Spatial Data Mining & Information Sharing of the Ministry of Education, Fuzhou University [2017LSDMIS03, 2016LSDMIS06]
  2. Hubei Provincial Key Laboratory of Intelligent Geo-information Processing (China University of Geosciences) [KLIGIP2016A03]
  3. Engineering Research Center of Geospatial Information and Digital Technology (NASG) [SIDT20170601]
  4. Fundamental Research Funds for the Central Universities [ZYGX2015J111]
  5. National Key Research and Development program of China [2017YFB0504202]
  6. National Science Foundation of the United States [1251095, 1723292]
  7. Direct For Computer & Info Scie & Enginr
  8. Div Of Information & Intelligent Systems [1723292, 1251095] Funding Source: National Science Foundation

向作者/读者索取更多资源

Density-based spatial clustering of applications with noise (DBSCAN) is a density-based clustering algorithm that has the characteristics of being able to discover clusters of any shape, effectively distinguishing noise points and naturally supporting spatial databases. DBSCAN has been widely used in the field of spatial data mining. This paper studies the parallelization design and realization of the DBSCAN algorithm based on the Spark platform, and solves the following problems that arise when computing macro data: the requirement of a great deal of calculation using the single-node algorithm; the low level of resource-utilization with the multi-node algorithm; the large time consumption; and the lack of instantaneity. The experimental results indicate that the proposed parallel algorithm design is able to achieve more stable speedup at an increased involved spatial data scale.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据