期刊
IEEE TRANSACTIONS ON SIGNAL PROCESSING
卷 70, 期 -, 页码 3371-3385出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSP.2022.3185905
关键词
Task analysis; Delays; Resource management; Encoding; Distributed computing; Computational modeling; Markov processes; Coded computation; communication delay; Markov's inequality; convex optimization; successive convex approximation
资金
- National Key R&D Program of China [2020YFB1806605]
- National Natural Science Foundation of China [62022049, 61871254]
- China Postdoctoral Science Foundation [2020M680558]
- Hitachi Ltd.
- European Research Council [677854]
- U.K. EPSRC [EP/T023600/1, CHISTERA-18-SDCDN001]
This paper investigates a distributed computing scenario with multiple master hosts and heterogeneous workers, aiming to reduce the straggler effect and minimize task delays through coded computation and optimized task allocation. The proposed algorithms improve task completion delay compared to benchmarks.
Distributed computing enables large-scale computation tasks to be processed by multiple workers in parallel. However, the randomness of communication and computation delays across the workers causes the straggler effect, which may degrade the delay performance. Coded computation helps to mitigate the straggler effect, but the amount of redundant load and task assignment to the workers should be carefully optimized. In this work, we consider a multi-master heterogeneous-worker distributed computing scenario, where multiple matrix multiplication tasks are encoded and allocated to the workers with different computing capabilities. The goal is to minimize the communication plus computation delay of all the tasks. We propose joint worker assignment, resource allocation and load allocation algorithms under both dedicated and fractional worker assignment policies, where each worker can process the encoded tasks from either a single master or multiple masters, respectively. Then, the non-convex delay minimization problem is solved by employing the Markov's inequality-based approximation, Karush-Kuhn-Tucker conditions, and successive convex approximation methods. Through extensive simulations, we show that the proposed algorithms can reduce the task completion delay compared to the benchmarks.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据