3.8 Proceedings Paper

Efficient Sparse Collective Communication and it s application to Accelerate Distributed Deep Learning

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3452296.3472904

Keywords

-

Funding

  1. King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) [OSR-CRG2020-4382]
  2. China Scholarship Council (CSC)

Ask authors/readers for more resources

OmniReduce is an efficient streaming aggregation system that leverages sparsity to maximize effective bandwidth use by sending only non-zero data blocks, accelerating distributed training and providing better performance in network-bottlenecked scenarios.
Efficient collective communication is crucial to parallel-computing applications such as distributed training of large-scale recommendation systems and natural language processing models. Existing collective communication libraries focus on optimizing operations for dense inputs, resulting in transmissions of many zeros when inputs are sparse. This counters current trends that see increasing data sparsity in large models. We propose OmniReduce, an efficient streaming aggregation system that exploits sparsity to maximize effective bandwidth use by sending only non-zero data blocks. We demonstrate that this idea is beneficial and accelerates distributed training by up to 8.2x. Even at 100 Gbps, OmniReduce delivers 1.4 2.9x better performance for network-bottlenecked DNNs.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available