☆ 4.7 Article

Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2019)

期刊

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

卷 30, 期 3, 页码 575-588

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TPDS.2018.2867222

关键词

Broadcast; deep learning; hardware multicast; GPU; GPUDirect RDMA; heterogeneous broadcast; streaming

类别

Computer Science, Theory & Methods Engineering, Electrical & Electronic

资金

United States Department of Defense (DOD) High Performance Computing Modernization Program (HPCMP) User Productivity Enhancement and Technology Transfer (PETTT) [GS04T09DBC0017]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Broadcast is a widely used operation in many streaming and deep learning applications to disseminate large amounts of data on emerging heterogeneous High-Performance Computing (HPC) systems. However, traditional broadcast schemes do not fully utilize hardware features for Graphics Processing Unit (GPU)-based applications. In this paper, a model-oriented analysis is presented to identify performance bottlenecks of existing broadcast schemes on GPU clusters. Next, streaming-based broadcast schemes are proposed to exploit InfiniBand hardware multicast (IB-MCAST) and NVIDIA GPUDirect technology for efficient message transmission. The proposed designs are evaluated in the context of using Message Passing Interface (MPI) based benchmarks and applications. The experimental results indicate improved scalability and up to 82 percent reduction of latency compared to the state-of-the-art solutions in the benchmark-level evaluation. Furthermore, compared to the state-of-the-art, the proposed design yields stable higher throughput for a synthetic streaming workload, and 1.3x faster training time for a deep learning framework.

Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast

期刊

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast

期刊

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文