☆ 3.8 Proceedings Paper

Efficient Sparse Collective Communication and it s application to Accelerate Distributed Deep Learning

SIGCOMM '21: PROCEEDINGS OF THE 2021 ACM SIGCOMM 2021 CONFERENCE (2021)

Journal

SIGCOMM '21: PROCEEDINGS OF THE 2021 ACM SIGCOMM 2021 CONFERENCE

Volume -, Issue -, Pages 676-691

Publisher

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3452296.3472904

Keywords

Funding

King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) [OSR-CRG2020-4382]
China Scholarship Council (CSC)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

OmniReduce is an efficient streaming aggregation system that leverages sparsity to maximize effective bandwidth use by sending only non-zero data blocks, accelerating distributed training and providing better performance in network-bottlenecked scenarios.

Efficient collective communication is crucial to parallel-computing applications such as distributed training of large-scale recommendation systems and natural language processing models. Existing collective communication libraries focus on optimizing operations for dense inputs, resulting in transmissions of many zeros when inputs are sparse. This counters current trends that see increasing data sparsity in large models. We propose OmniReduce, an efficient streaming aggregation system that exploits sparsity to maximize effective bandwidth use by sending only non-zero data blocks. We demonstrate that this idea is beneficial and accelerates distributed training by up to 8.2x. Even at 100 Gbps, OmniReduce delivers 1.4 2.9x better performance for network-bottlenecked DNNs.

Efficient Sparse Collective Communication and it s application to Accelerate Distributed Deep Learning

Journal

SIGCOMM '21: PROCEEDINGS OF THE 2021 ACM SIGCOMM 2021 CONFERENCE

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Efficient Sparse Collective Communication and it s application to Accelerate Distributed Deep Learning

Journal

SIGCOMM '21: PROCEEDINGS OF THE 2021 ACM SIGCOMM 2021 CONFERENCE

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper