☆ 4.7 Article

Parallel batch k-means for Big data clustering

COMPUTERS & INDUSTRIAL ENGINEERING (2021)

Journal

COMPUTERS & INDUSTRIAL ENGINEERING

Volume 152, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.cie.2020.107023

Keywords

Big data; Clustering; Batch; K-means; Parallel batch clustering

Funding

Science Foundation of the State Oil Company of the Azerbaijan Republic [03LR -AMEA]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This article introduces a new parallel batch clustering algorithm based on the k-means algorithm, which reduces computation complexity by splitting the dataset into multiple partitions and proposes a method to determine the optimal batch size. Experimental results show the practical applicability of this method for handling Big Data.

The application of clustering algorithms is expanding due to the rapid growth of data volumes. Nevertheless, existing algorithms are not always effective because of high computational complexity. A new parallel batch clustering algorithm based on the k-means algorithm is proposed. The proposed algorithm splits a dataset into equal partitions and reduces the exponential growth of computations. The goal is to preserve the characteristics of the dataset while increasing the clustering speed. The centers of the clusters are calculated for each partition, which are merged and also clustered later. The approach to determine the optimal batch size is also considered. The statistical significance of the proposed approach is provided. Six experimental datasets are used to evaluate the effectiveness of the proposed parallel batch clustering. The obtained results are compared with the k-means algorithm. The analysis shows the practical applicability of the proposed algorithm to Big Data.

Parallel batch k-means for Big data clustering

Journal

COMPUTERS & INDUSTRIAL ENGINEERING

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Parallel batch k-means for Big data clustering

Journal

COMPUTERS & INDUSTRIAL ENGINEERING

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper