☆ 4.5 Article

DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce

INFORMATION SYSTEMS (2014)

期刊

INFORMATION SYSTEMS

卷 42, 期 -, 页码 15-35

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.is.2013.11.002

关键词

Clustering algorithm; Density-based clustering; Parallel algorithm; MapReduce

类别

Computer Science, Information Systems

资金

Basic Science Research Program [NRF-2009-0078828]
Next-Generation Information Computing Development Program [NRF-2012M3C4A7033342]
National Research Foundation of Korea (NRF) of the Ministry of Science, ICT & Future Planning (MSIP)
Information Technology Research Center (ITRC)
National IT Industry Promotion Agency (NIPA) of MSIP [NIPA-2013-H0301-13-4009]
SK Planet Cooperation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Clustering is a useful data mining technique which groups data points such that the points within a single group have similar characteristics, while the points in different groups are dissimilar. Density-based clustering algorithms such as DBSCAN and OPTICS are one kind of widely used clustering algorithms. As there is an increasing trend of applications to deal with vast amounts of data, clustering such big data is a challenging problem. Recently, parallelizing clustering algorithms on a large cluster of commodity machines using the MapReduce framework have received a lot of attention. In this paper, we first propose the new density-based clustering algorithm, called DBCURE, which is robust to find clusters with varying densities and suitable for parallelizing the algorithm with MapReduce. We next develop DBCURE-MR, which is a parallelized DBCURE using MapReduce. While traditional density-based algorithms find each cluster one by one, our DBCURE-MR finds several clusters together in parallel. We prove that both DBCURE and DBCURE-MR find the clusters correctly based on the definition of density-based clusters. Our experimental results with various data sets confirm that DBCURE-MR finds clusters efficiently without being sensitive to the clusters with varying densities and scales up well with the MapReduce framework. (C) 2013 Published by Elsevier Ltd.

DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce

期刊

INFORMATION SYSTEMS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce

期刊

INFORMATION SYSTEMS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文