期刊
PATTERN RECOGNITION LETTERS
卷 93, 期 -, 页码 78-84出版社
ELSEVIER
DOI: 10.1016/j.patrec.2016.07.027
关键词
Artificial Bee Colony (ABC); MapReduce; Data mining; Clustering; Distributed computing; Hadoop
资金
- Faculty of Engineering at Sriracha, Kasetsart University Sriracha Campus [2559/1]
The progress of technology has been a significant factor in increasing the growth of digital data. Therefore, good data analysis is a necessity for making better decisions. Clustering is one of the most important elements in the field of data analysis. However, the clustering of very large datasets is considered a primary concern. The improvement of computational models along with the ability to cluster huge volumes of data within a reasonable amount of time is thus required. MapReduce is a powerful programming model and an associated implement for processing large datasets with a parallel, distributed algorithm in a computing cluster. In this paper, a MapReduce-based artificial bee colony called MR-ABC is proposed for data clustering. The ABC is implemented based on the MapReduce model in the Hadoop framework and utilized to optimize the assignment of the large data instances to clusters with the objective of minimizing the sum of the squared Euclidean distance between each data instance and the centroid of the cluster to which it belongs. The experimental results demonstrate that our proposed algorithm is well suited for dealing with massive amounts of data, while the quality level of the clustering results is still maintained. (C) 2016 Elsevier B.V. All rights reserved.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据