4.5 Article

An effective 3-D fast fourier transform framework for multi-GPU accelerated distributed-memory systems

期刊

JOURNAL OF SUPERCOMPUTING
卷 78, 期 15, 页码 17055-17073

出版社

SPRINGER
DOI: 10.1007/s11227-022-04491-7

关键词

3D FFT; GPU; Distributed; MPI; OpenMP

资金

  1. Major Project on the Integration of Industry, Education and Research of Zhongshan [210610173898370]

向作者/读者索取更多资源

This paper introduces an efficient 3D FFT framework for multi-GPU distributed-memory systems, which utilizes a hybrid programming model combining MPI and OpenMP for effective communication, and adopts an asynchronous strategy and fast parallel kernels for acceleration.
This paper introduces an efficient and flexible 3D FFT framework for state-of-the-art multi-GPU distributed-memory systems. In contrast to the traditional pure MPI implementation, the multi-GPU distributed-memory systems can be exploited by employing a hybrid multi-GPU programming model that combines MPI with OpenMP to achieve effective communication. An asynchronous strategy that creates multiple streams and threads to reduce blocking time is adopted to accelerate intra-node communication. Furthermore, we combine our scheme with the GPU-Aware MPI implementation to perform GPU-GPU data transfers without CPU involvement. We also optimize the local FFT and transpose by creating fast parallel kernels to accelerate the total transform. Results show that our framework outperforms the state-of-the-art distributed 3D FFT library, being up to achieve 2x faster in a single node and 1.65x faster using two nodes.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据