☆ 4.7 Article

Coded Computation Across Shared Heterogeneous Workers With Communication Delay

IEEE TRANSACTIONS ON SIGNAL PROCESSING (2022)

期刊

IEEE TRANSACTIONS ON SIGNAL PROCESSING

卷 70, 期 -, 页码 3371-3385

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TSP.2022.3185905

关键词

Task analysis; Delays; Resource management; Encoding; Distributed computing; Computational modeling; Markov processes; Coded computation; communication delay; Markov's inequality; convex optimization; successive convex approximation

类别

Engineering, Electrical & Electronic

资金

National Key R&D Program of China [2020YFB1806605]
National Natural Science Foundation of China [62022049, 61871254]
China Postdoctoral Science Foundation [2020M680558]
Hitachi Ltd.
European Research Council [677854]
U.K. EPSRC [EP/T023600/1, CHISTERA-18-SDCDN001]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper investigates a distributed computing scenario with multiple master hosts and heterogeneous workers, aiming to reduce the straggler effect and minimize task delays through coded computation and optimized task allocation. The proposed algorithms improve task completion delay compared to benchmarks.

Distributed computing enables large-scale computation tasks to be processed by multiple workers in parallel. However, the randomness of communication and computation delays across the workers causes the straggler effect, which may degrade the delay performance. Coded computation helps to mitigate the straggler effect, but the amount of redundant load and task assignment to the workers should be carefully optimized. In this work, we consider a multi-master heterogeneous-worker distributed computing scenario, where multiple matrix multiplication tasks are encoded and allocated to the workers with different computing capabilities. The goal is to minimize the communication plus computation delay of all the tasks. We propose joint worker assignment, resource allocation and load allocation algorithms under both dedicated and fractional worker assignment policies, where each worker can process the encoded tasks from either a single master or multiple masters, respectively. Then, the non-convex delay minimization problem is solved by employing the Markov's inequality-based approximation, Karush-Kuhn-Tucker conditions, and successive convex approximation methods. Through extensive simulations, we show that the proposed algorithms can reduce the task completion delay compared to the benchmarks.

Coded Computation Across Shared Heterogeneous Workers With Communication Delay

期刊

IEEE TRANSACTIONS ON SIGNAL PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Coded Computation Across Shared Heterogeneous Workers With Communication Delay

期刊

IEEE TRANSACTIONS ON SIGNAL PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文