期刊
JOURNAL OF SUPERCOMPUTING
卷 78, 期 15, 页码 17055-17073出版社
SPRINGER
DOI: 10.1007/s11227-022-04491-7
关键词
3D FFT; GPU; Distributed; MPI; OpenMP
类别
资金
- Major Project on the Integration of Industry, Education and Research of Zhongshan [210610173898370]
This paper introduces an efficient 3D FFT framework for multi-GPU distributed-memory systems, which utilizes a hybrid programming model combining MPI and OpenMP for effective communication, and adopts an asynchronous strategy and fast parallel kernels for acceleration.
This paper introduces an efficient and flexible 3D FFT framework for state-of-the-art multi-GPU distributed-memory systems. In contrast to the traditional pure MPI implementation, the multi-GPU distributed-memory systems can be exploited by employing a hybrid multi-GPU programming model that combines MPI with OpenMP to achieve effective communication. An asynchronous strategy that creates multiple streams and threads to reduce blocking time is adopted to accelerate intra-node communication. Furthermore, we combine our scheme with the GPU-Aware MPI implementation to perform GPU-GPU data transfers without CPU involvement. We also optimize the local FFT and transpose by creating fast parallel kernels to accelerate the total transform. Results show that our framework outperforms the state-of-the-art distributed 3D FFT library, being up to achieve 2x faster in a single node and 1.65x faster using two nodes.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据