☆ 4.5 Article

Accelerating distributed machine learning with model compression and graph partition

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2023)

期刊

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

卷 179, 期 -, 页码 -

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jpdc.2023.04.006

关键词

Data sparsity; Distributed machine learning; Graph partition; Parameter server framework

类别

Computer Science, Theory & Methods

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a method to optimize the communication cost of the parameter server framework in distributed training by compressing the model and optimizing data and parameter allocation. Experimental results show that this compression and allocation scheme can efficiently reduce the communication overhead for both linear and deep neural network models.

The rapid growth of data and parameter sizes of machine learning models makes it necessary to improve the efficiency of distributed training. It is observed that the communication cost usually is the bottleneck of distributed training systems. In this paper, we focus on the parameter server framework which is a widely deployed distributed learning framework. The frequent parameter pull, push, and synchronization among multiple machines leads to a huge communication volume. We aim to reduce the communication cost for the parameter server framework. Compressing the training model and optimizing the data and parameter allocation are two existing approaches to reducing communication costs. We jointly consider these two approaches and propose to optimize the data and parameter allocation after compression. Different from previous allocation schemes, the data sparsity property may no longer hold after compression. It brings additional opportunities and challenges for the allocation problem. We also consider the allocation problem for both linear and deep neural network (DNN) models. Fixed and dynamic partition algorithms are proposed accordingly. Experiments on real-world datasets show that our joint compression and partition scheme can efficiently reduce communication overhead for linear and DNN models.(c) 2023 Published by Elsevier Inc.

Accelerating distributed machine learning with model compression and graph partition

期刊

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Accelerating distributed machine learning with model compression and graph partition

期刊

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文