4.2 Article

Hybrid Electrical/Optical Switch Architectures for Training Distributed Deep Learning in Large-Scale

期刊

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS
卷 E104D, 期 8, 页码 1332-1339

出版社

IEICE-INST ELECTRONICS INFORMATION COMMUNICATION ENGINEERS
DOI: 10.1587/transinf.2020EDP7201

关键词

distributed deep learning; high performance computing (HPC); optical circuit switching; hybrid switching

资金

  1. New Energy and Industrial Technology Development Organization (NEDO) [JPNP16007]

向作者/读者索取更多资源

The study investigated the benefit of increasing inter-node link bandwidth by using hybrid switching systems, and found that optical switching can speed up the data transfer of synchronous data-parallelism training. Simulation results demonstrated that this approach can accelerate the training time of deep learning applications, especially in large-scale scenarios.
Data parallelism is the dominant method used to train deep learning (DL) models on High-Performance Computing systems such as large-scale GPU clusters. When training a DL model on a large number of nodes, inter-node communication becomes bottle-neck due to its relatively higher latency and lower link bandwidth (than intra-node communication). Although some communication techniques have been proposed to cope with this problem, all of these approaches target to deal with the large message size issue while diminishing the effect of the limitation of the inter-node network. In this study, we investigate the benefit of increasing inter-node link bandwidth by using hybrid switching systems, i.e., Electrical Packet Switching and Optical Circuit Switching. We found that the typical data-transfer of synchronous data-parallelism training is long-lived and rarely changed that can be speed-up with optical switching. Simulation results on the Simgrid simulator show that our approach speed-up the training time of deep learning applications, especially in a large-scale manner.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据