☆ 4.7 Article

Peta-Scale Embedded Photonics Architecture for Distributed Deep Learning Applications

JOURNAL OF LIGHTWAVE TECHNOLOGY (2023)

期刊

JOURNAL OF LIGHTWAVE TECHNOLOGY

卷 41, 期 12, 页码 3737-3749

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/JLT.2023.3276588

关键词

Distributed deep learning; collective communication; silicon photonics; optical interconnect

类别

Engineering, Electrical & Electronic Optics Telecommunications

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

SiPAC is a Silicon Photonic Accelerated Compute cluster that accelerates distributed deep learning training by utilizing embedded photonics and a novel collective algorithm. It expedites collective communications commonly used in deep learning, effectively solving the communication bottleneck problem.

As Deep Learning (DL) models grow larger and more complex, training jobs are increasingly distributed across multiple Computing Units (CU) such as GPUs and TPUs. Each CU processes a sub-part of the model and synchronizes results with others. Communication among these CUs has emerged as a key bottleneck in the training process. In this work, we present SiPAC, a Silicon Photonic Accelerated Compute cluster. SiPAC accelerates distributed DL training by means of two co-designed components: a photonic physical layer and a novel collective algorithm. The physical layer exploits embedded photonics to bring peta-scale I/O directly to the CUs of a DL optimized cluster and uses resonator-based optical wavelength selectivity to realize hardware multi-casting. The collective algorithm builds on the hardware multi-casting primitive. This combination expedites a variety of collective communications commonly employed in DL training and has the potential to drastically ease the communication bottlenecks. We demonstrate the feasibility of realizing the SiPAC architecture through 1) an optical testbed experiment where an array of comb laser wavelengths are shuffled by a cascaded ring switch, with each ring selecting and forwarding multiple wavelengths to increase the effective communication bandwidth and hence demonstrating the hardware multicasting primitive, and 2) a four-GPU testbed running a realistic DL workload that achieves 22% system-level performance improvement relative to a similarly-sized leaf-spine topology. Large scale simulations show that SiPAC achieves a 1.4x to 5.9x communication time reduction compared to state-of-the-art compute clusters for representative collective communications.

Peta-Scale Embedded Photonics Architecture for Distributed Deep Learning Applications

期刊

JOURNAL OF LIGHTWAVE TECHNOLOGY

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Peta-Scale Embedded Photonics Architecture for Distributed Deep Learning Applications

期刊

JOURNAL OF LIGHTWAVE TECHNOLOGY

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文