4.6 Article

Big optimization with genetic algorithms: Hadoop, Spark, and MPI

Journal

SOFT COMPUTING
Volume 27, Issue 16, Pages 11469-11484

Publisher

SPRINGER
DOI: 10.1007/s00500-023-08301-x

Keywords

Big optimization; Genetic algorithms; MapReduce; Hadoop; Spark; MPI

Ask authors/readers for more resources

This article discusses the use of MapReduce as a computing paradigm to solve large-scale combinatorial optimization problems, focusing on the potential and advantages of developing genetic algorithms using Hadoop, Spark, and MPI as middleware platforms. The results show that MRGA performs better on the Hadoop framework compared to Spark and MPI when dealing with high-dimensional datasets.
Solving problems of high dimensionality (and complexity) usually needs the intense use of technologies, like parallelism, advanced computers and new types of algorithms. MapReduce (MR) is a computing paradigm long time existing in computer science that has been proposed in the last years for dealing with big data applications, though it could also be used for many other tasks. In this article, we address big optimization: the solution to large instances of combinatorial optimization problems by using MR as the paradigm to design solvers that allow transparent runs on a varied number of computers that collaborate to find the problem solution. We study and analyze the MR technology, focusing on Hadoop, Spark, and MPI as the middleware platforms to develop genetic algorithms (GAs). From this, MRGA solvers arise using a different programming paradigm from the usual imperative transformational programming. Our objective is to confirm the expected benefits of these systems, namely file, memory, and communication management, over the resulting algorithms. We analyze our MRGA solvers from relevant points of view like scalability, speedup, and communication vs. computation time in big optimization. The results for high-dimensional datasets show that the MRGA over Hadoop outperforms the implementations in Spark and MPI frameworks. For the smallest datasets, the execution of MRGA on MPI is always faster than the executions of the remaining MRGAs. Finally, the MRGA over Spark presents the lowest communication times. Numerical and time insights are given in our work, so as to ease future comparisons of new algorithms over these three popular technologies.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available