4.6 Article

BPCM: A Flexible High-Speed Bypass Parallel Communication Mechanism for GPU Cluster

Journal

IEEE ACCESS
Volume 8, Issue -, Pages 103256-103272

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2020.2999096

Keywords

DPDK; GPU cluster; multi-core; multi-NIC; data link layer; bypass parallel communication

Funding

  1. National Natural Science Foundation of China [61572325, 60970012]
  2. Ministry of Education Doctoral Fund of Ph.D. Supervisor of China [20113120110008]
  3. Shanghai Key Science and Technology Project in Information Technology Field [14511107902, 16DZ1203603]
  4. Shanghai Leading Academic Discipline775 Project [XTKX2012]
  5. Shanghai Engineering Research Center Project [GCZX14014, C14001]
  6. Intel Asia Pacic Research and Development Center

Ask authors/readers for more resources

With the increasing complexity of computational tasks faced by artificial intelligence technology, the scale of machine learning models continues to expand, and the data volume and frequency of parameter synchronization also increase. This will cause the communication bandwidth within the GPU cluster to become the biggest bottleneck for distributed model training. Many existing solutions cannot be widely promoted due to the need for professional equipment support, high cost, and difficulty in use. To solve this problem, this paper proposes a multi-network card bypass parallel communication mechanism based on Intel DPDK technology to increase the bandwidth within the GPU cluster at a lower cost and make full use of the idle CPU resources of the GPU server to accelerate data transmission. Firstly, we propose a data transmission model based on multiple network cards, and design a port load balancing algorithm to ensure load balancing of multiple network cards. Secondly, the model and algorithm of CPU multi-core scheduling are implemented to reduce CPU energy consumption, resource occupation, and the impact on other applications. Furthermore, for multiple application scenarios, a rate adjustment model and algorithm are designed and implemented to ensure fair use of application bandwidth. Finally, the experimental results show that this mechanism can provide high bandwidth for GPU clusters with inexpensive multi-network cards, and provide superimposed bandwidth of multi-network cards in a single connection, which has high reliability and transmission efficiency, and is simple to use and flexible to expand.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available