☆ 4.6 Article

A survey on parallel clustering algorithms for Big Data

ARTIFICIAL INTELLIGENCE REVIEW (2021)

Journal

ARTIFICIAL INTELLIGENCE REVIEW

Volume 54, Issue 4, Pages 2411-2443

Publisher

SPRINGER

DOI: 10.1007/s10462-020-09918-2

Keywords

Algorithms; Big Data; Clustering; Data mining; DBSCAN; FPGA; GPU; k-means; MapReduce; MPI; Multi-cores CPU; Spark

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Recent research has developed many parallel clustering algorithms under the concept of parallel computing to address the speed and scalability issues of traditional clustering algorithms in the Big Data context. These algorithms are divided into two categories of horizontal and vertical scaling platforms, categorized based on the Big Data processing platforms.

Data clustering is one of the most studied data mining tasks. It aims, through various methods, to discover previously unknown groups within the data sets. In the past years, considerable progress has been made in this field leading to the development of innovative and promising clustering algorithms. These traditional clustering algorithms present some serious issues in connection with the speed-up, the throughput, and the scalability. Thus, they can no longer be directly used in the context of Big Data, where data are mainly characterized by their volume, velocity, and variety. In order to overcome their limitations, the research today is heading to the parallel computing concept by giving rise to the so-called parallel clustering algorithms. This paper presents an overview of the latest parallel clustering algorithms categorized according to the computing platforms used to handle the Big Data, namely, the horizontal and vertical scaling platforms. The former category includes peer-to-peer networks, MapReduce, and Spark platforms, while the latter category includes Multi-core processors, Graphics Processing Unit, and Field Programmable Gate Arrays platforms. In addition, it includes a comparison of the performance of the reviewed algorithms based on some common criteria of clustering validation in the Big Data context. Therefore, it provides the reader with an overall vision of the current parallel clustering techniques.

A survey on parallel clustering algorithms for Big Data

Journal

ARTIFICIAL INTELLIGENCE REVIEW

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A survey on parallel clustering algorithms for Big Data

Journal

ARTIFICIAL INTELLIGENCE REVIEW

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper