3.8 Proceedings Paper

Co-designing the Topology/Algorithm to Accelerate Distributed Training

Related references

Note: Only part of the references are listed.
Proceedings Paper Computer Science, Hardware & Architecture

Scalable Distributed Training of Recommendation Models: An ASTRA-SIM+NS3 case-study with TCP/IP transport

Saeed Rashidi et al.

2020 IEEE SYMPOSIUM ON HIGH-PERFORMANCE INTERCONNECTS (HOTI 2020) (2020)

Proceedings Paper Computer Science, Hardware & Architecture

ASTRA-SIM: Enabling SW/HW Co-Design Exploration for Distributed DL Training Platforms

Saeed Rashidi et al.

2020 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS) (2020)

Article Computer Science, Theory & Methods

Bandwidth optimal all-reduce algorithms for clusters of workstations

Pitch Patarasuk et al.

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2009)

Proceedings Paper Computer Science, Hardware & Architecture

GARNET: A Detailed On-Chip Network Model inside a Full-System Simulator

Niket Agarwal et al.

ISPASS 2009: IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (2009)

Article Computer Science, Hardware & Architecture

Optimization of collective communication operations in MPICH

R Thakur et al.

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS (2005)