4.7 Article

Peta-Scale Embedded Photonics Architecture for Distributed Deep Learning Applications

Related references

Note: Only part of the references are listed.
Article Engineering, Electrical & Electronic

Petabit-Scale Silicon Photonic Interconnects With Integrated Kerr Frequency Combs

Anthony Rizzo et al.

Summary: Silicon photonics has great potential in improving optical interconnects in data centers and high performance computers, enabling higher transmission rates and lower energy consumption. This study reviews recent progress in silicon photonic interconnects, with a focus on chip-scale Kerr frequency comb sources, and provides a comprehensive overview of scalable silicon photonic systems. Experimental results demonstrate the feasibility of volume manufacturing for this technology.

IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS (2023)

Article Computer Science, Hardware & Architecture

Impact of Synchronization Topology on DML Performance: Both Logical Topology and Physical Topology

Shuai Wang et al.

Summary: This paper investigates the impact of synchronization topology on distributed machine learning performance, proposes a hierarchical parameter synchronization topology called HiPS, and compares different physical network topologies, finding that the HiPS+BCube combination offers the best performance.

IEEE-ACM TRANSACTIONS ON NETWORKING (2022)

Proceedings Paper Computer Science, Hardware & Architecture

Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models

Dheevatsa Mudigere et al.

Summary: This paper introduces Neo, a software-hardware co-designed system for high-performance distributed training of large-scale DLRMs. Neo achieves high performance, memory efficiency, and communication optimization through 4D parallelism strategy and critical system optimizations, outperforming existing systems in training performance.

PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22) (2022)

Article Engineering, Electrical & Electronic

Silicon Photonic Flex-LIONS for Reconfigurable Multi-GPU Systems

Marjan Fariborz et al.

Summary: This study proposes a solution to interconnect multiple GPUs using Flex-LIONS optical technology, which can adapt the optical connectivity based on traffic demands, resulting in a reduction in execution time.

JOURNAL OF LIGHTWAVE TECHNOLOGY (2021)

Article Computer Science, Information Systems

Hybrid Electrical/Optical Switch Architectures for Training Distributed Deep Learning in Large-Scale

Thao-Nguyen Truong et al.

Summary: The study investigated the benefit of increasing inter-node link bandwidth by using hybrid switching systems, and found that optical switching can speed up the data transfer of synchronous data-parallelism training. Simulation results demonstrated that this approach can accelerate the training time of deep learning applications, especially in large-scale scenarios.

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS (2021)

Article Engineering, Electrical & Electronic

X-NEST: A Scalable, Flexible, and High-Performance Network Architecture for Distributed Machine Learning

Yunfeng Lu et al.

Summary: This article discusses the importance of network interconnectivity for the performance of neural network model training in large-scale distributed machine learning systems. A scalable and high-performance network architecture called X-NEST is proposed, which can dynamically change its topology and number of connections based on traffic patterns to improve network performance and resource utilization.

JOURNAL OF LIGHTWAVE TECHNOLOGY (2021)

Proceedings Paper Computer Science, Hardware & Architecture

Co-designing the Topology/Algorithm to Accelerate Distributed Training

Xiang Hou et al.

Summary: With the development of Deep Learning, the complexity of Deep Neural Network models has increased, leading to challenges in hardware training platforms due to the need for large computing and memory resources for training large models. Distributed training platforms are being developed to address these challenges, with a focus on optimizing communication efficiency in interconnection networks to improve system performance.

19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021) (2021)

Proceedings Paper Computer Science, Information Systems

Efficient Sparse Collective Communication and it s application to Accelerate Distributed Deep Learning

Jiawei Fei et al.

Summary: OmniReduce is an efficient streaming aggregation system that leverages sparsity to maximize effective bandwidth use by sending only non-zero data blocks, accelerating distributed training and providing better performance in network-bottlenecked scenarios.

SIGCOMM '21: PROCEEDINGS OF THE 2021 ACM SIGCOMM 2021 CONFERENCE (2021)

Proceedings Paper Computer Science, Hardware & Architecture

Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms

Saeed Rashidi et al.

Summary: The study introduces a novel DL collective communication accelerator named ACE, which reduces the required memory bandwidth by freeing up the compute and memory resources at the endpoint. ACE increases the effective network bandwidth utilization by an average of 1.44x, leading to speedups of 1.41x, 1.12x, and 1.13x in iteration time for ResNet-50, GNMT, and DLRM compared to the best baseline configuration, respectively.

2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021) (2021)

Proceedings Paper Computer Science, Hardware & Architecture

Communication Algorithm-Architecture Co-Design for Distributed Deep Learning

Jiayi Huang et al.

Summary: The study introduces an efficient all-reduce algorithm, MULTITREE, which achieves efficient and scalable communication operations under different interconnect topologies. Through the co-design of algorithm and architecture, it reduces communication time and training time effectively.

2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021) (2021)

Proceedings Paper Computer Science, Theory & Methods

A Scalable Multicast Hybrid Broadband Crossbar Wavelength Selective Switch For Datacenters

Akhilesh S. P. Khope et al.

Summary: The new switching architecture is wavelength selective and supports multicast, showing a 50% reduction in the number of elements, reduced loss, and similar crosstalk compared to a multi-wavelength selective crossbar switch.

2021 IEEE 11TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC) (2021)

Article Computer Science, Theory & Methods

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect

Ang Li et al.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2020)

Article Computer Science, Theory & Methods

PSNet: Reconfigurable network topology design for accelerating parameter server architecture based distributed machine learning

Ling Liu et al.

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE (2020)

Article Computer Science, Hardware & Architecture

TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Package Optical I/O

Mark Wade et al.

IEEE MICRO (2020)

Article Multidisciplinary Sciences

Ultra-dense optical data transmission over standard fibre with a single chip source

Bill Corcoran et al.

NATURE COMMUNICATIONS (2020)

Article Engineering, Electrical & Electronic

Silicon Photonic 2.5D Multi-Chip Module Transceiver for High-Performance Data Centers

Nathan C. Abrams et al.

JOURNAL OF LIGHTWAVE TECHNOLOGY (2020)

Proceedings Paper Computer Science, Hardware & Architecture

ASTRA-SIM: Enabling SW/HW Co-Design Exploration for Distributed DL Training Platforms

Saeed Rashidi et al.

2020 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS) (2020)

Article Engineering, Electrical & Electronic

Performance Model and Design Rules for Optical Systems Employing Low-Resolution DAC/ADC

Sylvain Almonacil et al.

JOURNAL OF LIGHTWAVE TECHNOLOGY (2020)

Article Engineering, Electrical & Electronic

A 128 Gb/s PAM4 Silicon Microring Modulator With Integrated Thermo-Optic Resonance Tuning

Jie Sun et al.

JOURNAL OF LIGHTWAVE TECHNOLOGY (2019)

Article Engineering, Electrical & Electronic

Scalable Microring-Based Silicon Clos Switch Fabric With Switch-and-Select Stages

Qixiang Cheng et al.

IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS (2019)

Article Optics

Turn-key, high-efficiency Kerr comb source

Bok Young Kim et al.

OPTICS LETTERS (2019)

Proceedings Paper Computer Science, Theory & Methods

On the Feasibility of Hybrid Electrical/Optical Switch Architecture for Large-Scale Training of Distributed Deep Learning

Thao Nguyen Truong et al.

PROCEEDINGS OF 2019 IEEE/ACM WORKSHOP ON PHOTONICS-OPTICS TECHNOLOGY ORIENTED NETWORKING, INFORMATION AND COMPUTING SYSTEMS (PHOTONICS2019) (2019)

Article Engineering, Electrical & Electronic

Design Space Exploration of Microring Resonators in Silicon Photonic Interconnects: Impact of the Ring Curvature

Meisam Bahadori et al.

JOURNAL OF LIGHTWAVE TECHNOLOGY (2018)

Article Computer Science, Hardware & Architecture

Optimization of collective communication operations in MPICH

R Thakur et al.

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS (2005)