4.6 Article

A survey on parallel clustering algorithms for Big Data

Journal

ARTIFICIAL INTELLIGENCE REVIEW
Volume 54, Issue 4, Pages 2411-2443

Publisher

SPRINGER
DOI: 10.1007/s10462-020-09918-2

Keywords

Algorithms; Big Data; Clustering; Data mining; DBSCAN; FPGA; GPU; k-means; MapReduce; MPI; Multi-cores CPU; Spark

Ask authors/readers for more resources

Recent research has developed many parallel clustering algorithms under the concept of parallel computing to address the speed and scalability issues of traditional clustering algorithms in the Big Data context. These algorithms are divided into two categories of horizontal and vertical scaling platforms, categorized based on the Big Data processing platforms.
Data clustering is one of the most studied data mining tasks. It aims, through various methods, to discover previously unknown groups within the data sets. In the past years, considerable progress has been made in this field leading to the development of innovative and promising clustering algorithms. These traditional clustering algorithms present some serious issues in connection with the speed-up, the throughput, and the scalability. Thus, they can no longer be directly used in the context of Big Data, where data are mainly characterized by their volume, velocity, and variety. In order to overcome their limitations, the research today is heading to the parallel computing concept by giving rise to the so-called parallel clustering algorithms. This paper presents an overview of the latest parallel clustering algorithms categorized according to the computing platforms used to handle the Big Data, namely, the horizontal and vertical scaling platforms. The former category includes peer-to-peer networks, MapReduce, and Spark platforms, while the latter category includes Multi-core processors, Graphics Processing Unit, and Field Programmable Gate Arrays platforms. In addition, it includes a comparison of the performance of the reviewed algorithms based on some common criteria of clustering validation in the Big Data context. Therefore, it provides the reader with an overall vision of the current parallel clustering techniques.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available