4.7 Article

Coded Computation Across Shared Heterogeneous Workers With Communication Delay

期刊

IEEE TRANSACTIONS ON SIGNAL PROCESSING
卷 70, 期 -, 页码 3371-3385

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSP.2022.3185905

关键词

Task analysis; Delays; Resource management; Encoding; Distributed computing; Computational modeling; Markov processes; Coded computation; communication delay; Markov's inequality; convex optimization; successive convex approximation

资金

  1. National Key R&D Program of China [2020YFB1806605]
  2. National Natural Science Foundation of China [62022049, 61871254]
  3. China Postdoctoral Science Foundation [2020M680558]
  4. Hitachi Ltd.
  5. European Research Council [677854]
  6. U.K. EPSRC [EP/T023600/1, CHISTERA-18-SDCDN001]

向作者/读者索取更多资源

This paper investigates a distributed computing scenario with multiple master hosts and heterogeneous workers, aiming to reduce the straggler effect and minimize task delays through coded computation and optimized task allocation. The proposed algorithms improve task completion delay compared to benchmarks.
Distributed computing enables large-scale computation tasks to be processed by multiple workers in parallel. However, the randomness of communication and computation delays across the workers causes the straggler effect, which may degrade the delay performance. Coded computation helps to mitigate the straggler effect, but the amount of redundant load and task assignment to the workers should be carefully optimized. In this work, we consider a multi-master heterogeneous-worker distributed computing scenario, where multiple matrix multiplication tasks are encoded and allocated to the workers with different computing capabilities. The goal is to minimize the communication plus computation delay of all the tasks. We propose joint worker assignment, resource allocation and load allocation algorithms under both dedicated and fractional worker assignment policies, where each worker can process the encoded tasks from either a single master or multiple masters, respectively. Then, the non-convex delay minimization problem is solved by employing the Markov's inequality-based approximation, Karush-Kuhn-Tucker conditions, and successive convex approximation methods. Through extensive simulations, we show that the proposed algorithms can reduce the task completion delay compared to the benchmarks.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据