4.6 Article

Acceleration of Bi-Objective Optimization of Data-Parallel Applications for Performance and Energy on Heterogeneous Hybrid Platforms

期刊

IEEE ACCESS
卷 11, 期 -, 页码 27226-27245

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2023.3258684

关键词

Heuristic algorithms; Optimization; Clustering algorithms; High performance computing; Graphics processing units; Energy consumption; Multicore processing; Energy management; High-performance heterogeneous computing; energy-efficient computing; bi-objective optimization; performance optimization; energy optimization; data-parallel applications; workload distribution

向作者/读者索取更多资源

Accelerating the bi-objective optimization of applications for performance and energy is crucial. This work highlights the challenges of accelerating model-based methods for bi-objective optimization of data parallel applications. The proposed algorithms provide significant speedups over state-of-the-art solutions.
Accelerating the bi-objective optimization of applications for performance and energy is crucial to achieving energy efficiency objectives and meeting quality-of-service requirements in modern high-performance computing platforms and cloud computing infrastructures. In this work, we highlight the crucial challenges to accelerate model-based methods proposed for the bi-objective optimization of data parallel applications for performance and energy that employ workload distribution between the executing processors as the decision variable. The methods solve unconstrained bi-objective optimization problems and take input, the processors' performance and energy profiles in the form of discrete functions of workload size, and output Pareto-optimal solutions (workload distributions), minimizing the execution time and the total energy consumption of computations during the parallel execution of the application. One of the challenges is the fast computation of Pareto-optimal solutions. We then formulate the bi-objective optimization problem of data-parallel applications for performance and energy through workload distribution on a cluster of p identical hybrid nodes, each containing h heterogeneous processors. The state-of-the-art algorithm for solving the problem is sequential and takes exorbitant execution times to find Pareto-optimal solutions for even moderate numbers of processors. We propose two algorithms that address this shortcoming. The first algorithm is an exact sequential algorithm that is more efficient and amenable to parallelization and achieves a complexity reduction of O(m x h) over the state-of-the-art sequential algorithm where m is the cardinality of the input discrete execution time and dynamic energy functions. The second algorithm is a parallel algorithm executed by q identical parallel processes that reduces the complexity of our proposed sequential algorithm by O(q). It, therefore, achieves a complexity reduction of O(m x h x q) over the state-of-the-art sequential algorithm. Finally, we experimentally analyze the practical efficacy of our proposed algorithms for two data-parallel applications, matrix multiplication and fast Fourier transform, on a heterogeneous hybrid node containing an Intel Haswell multicore CPU, an Nvidia k40c GPU, and an Nvidia P100 GPU and simulations of clusters of such hybrid nodes. The experiments demonstrate that our proposed algorithms provide tremendous speedups over state-of-the-art solutions.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据