☆ 4.7 Article

Compressed Coded Distributed Computing

IEEE TRANSACTIONS ON COMMUNICATIONS (2021)

期刊

IEEE TRANSACTIONS ON COMMUNICATIONS

卷 69, 期 5, 页码 2773-2783

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCOMM.2021.3054906

关键词

Task analysis; Training; Distributed computing; Encoding; Bandwidth; Multicast communication; Machine learning; MapReduce; distributed training; gradient aggregation; coded multicasting

类别

Engineering, Electrical & Electronic Telecommunications

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Communication overhead is a major bottleneck in large-scale distributed computing systems, particularly for machine learning applications. The development of coded distributed computing has shown that coding opportunities across different computation tasks can reduce communication load. Compressed coded distributed computing combines compression and coding techniques to significantly reduce communication load, outperforming conventional methods and CDC schemes.

Communication overhead is one of the major performance bottlenecks in large-scale distributed computing systems, in particular for machine learning applications. Conventionally, compression techniques are used to reduce the load of communication by combining intermediate results of the same computation task as much as possible. Recently, via the development of coded distributed computing (CDC), it has been shown that it is possible to enable coding opportunities across intermediate results of different computation tasks to further reduce the communication load. We propose a new scheme, named compressed coded distributed computing (in short, compressed CDC), which jointly exploits the above two techniques (i.e., combining the intermediate results of the same computation and coding across the intermediate results of different computations) to significantly reduce the communication load for computations with linear aggregation (reduction) of intermediate results in the final stage that are prevalent in machine learning (e.g., distributed training algorithms where partial gradients are computed distributedly and then averaged in the final stage). In particular, compressed CDC first compresses/combines several intermediate results for a single computation, and then utilizes multiple such combined packets to create a coded multicast packet that is simultaneously useful for multiple computations. We characterize the achievable communication load of compressed CDC and show that it substantially outperforms both combining methods and CDC scheme. Based on the compressed CDC technique, we then study a distributed training problem as one of its application. We characterize the communication load for this distributed training problem and show that it is asymptotically optimal.

Compressed Coded Distributed Computing

期刊

IEEE TRANSACTIONS ON COMMUNICATIONS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Compressed Coded Distributed Computing

期刊

IEEE TRANSACTIONS ON COMMUNICATIONS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文