☆ 4.6 Article

A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark

ENTROPY (2023)

期刊

ENTROPY

卷 25, 期 2, 页码 -

出版社

MDPI

DOI: 10.3390/e25020259

关键词

multiobjective clustering; Apache Spark; multiobjective particle swarm optimization (MOPSO)

类别

Physics, Multidisciplinary

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a parallel multiobjective PSO weighted average clustering algorithm based on Apache Spark. The algorithm divides the entire dataset into multiple partitions and caches the data in memory using distributed parallel and memory-based computing of Apache Spark. The local fitness value of each particle is calculated in parallel according to the data in each partition, reducing the communication of data in the network. Additionally, a weighted average calculation of the local fitness values is performed to improve the problem of unbalanced data distribution affecting the results.

Multiobjective clustering algorithm using particle swarm optimization has been applied successfully in some applications. However, existing algorithms are implemented on a single machine and cannot be directly parallelized on a cluster, which makes it difficult for existing algorithms to handle large-scale data. With the development of distributed parallel computing framework, data parallelism was proposed. However, the increase in parallelism will lead to the problem of unbalanced data distribution affecting the clustering effect. In this paper, we propose a parallel multiobjective PSO weighted average clustering algorithm based on apache Spark (Spark-MOPSO-Avg). First, the entire data set is divided into multiple partitions and cached in memory using the distributed parallel and memory-based computing of Apache Spark. The local fitness value of the particle is calculated in parallel according to the data in the partition. After the calculation is completed, only particle information is transmitted, and there is no need to transmit a large number of data objects between each node, reducing the communication of data in the network and thus effectively reducing the algorithm's running time. Second, a weighted average calculation of the local fitness values is performed to improve the problem of unbalanced data distribution affecting the results. Experimental results show that the Spark-MOPSO-Avg algorithm achieves lower information loss under data parallelism, losing about 1% to 9% accuracy, but can effectively reduce the algorithm time overhead. It shows good execution efficiency and parallel computing capability under the Spark distributed cluster.

A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark

期刊

ENTROPY

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark

期刊

ENTROPY

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文