4.5 Article Proceedings Paper

FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures

期刊

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jpdc.2020.05.008

关键词

HPC; MPI; Derived datatypes; CPU and GPU; NVIDIA DGX2

资金

  1. NSF [ACI2007991, CNS-1513120, ACI-1450440, CCF-1565414, ACI1664137, ACI-1931537]

向作者/读者索取更多资源

This paper addresses the challenges of MPI derived datatype processing and proposes FALCON-X - A Fast and Low-overhead Communication framework for optimized zero-copy intra-node derived datatype communication on emerging CPU/GPU architectures. We quantify various performance bottlenecks such as memory layout translation and copy overheads for highly fragmented MPI datatypes and propose novel pipelining and memoization-based designs to achieve efficient derived datatype communication. In addition, we also propose enhancements to the MPI standard to address the semantic limitations. The experimental evaluations show that our proposed designs significantly improve the intra-node communication latency and bandwidth over state-of-the-art MPI libraries on modern CPU and GPU systems. By using representative application kernels such as MILC, WRF, NAS_MG, Specfem3D, and Stencils on three different CPU architectures and two different GPU systems including DGX-2, we demonstrate up to 5.5x improvement on multi-core CPUs and 120x benefits on DXG-2 GPU system over state-of-the-art designs in other MPI libraries. (C) 2020 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据