☆ 4.3 Article

Parallel implementation and scalability analysis of 3D Fast Fourier Transform using 2D domain decomposition

PARALLEL COMPUTING (2013)

期刊

PARALLEL COMPUTING

卷 39, 期 1, 页码 58-77

出版社

ELSEVIER

DOI: 10.1016/j.parco.2012.12.002

关键词

FFT; 2D decomposition; Parallel computing

类别

Computer Science, Theory & Methods

资金

National Science Foundation [OCI-0904534, ATM-0730766, CRI-0958512, OCI-1053575]
National Center for Atmospheric Research (NCAR)
National Science Foundation
Office of Science of the U.S. Department of Energy [DE-AC02-05CH11231]
Direct For Computer & Info Scie & Enginr
Office of Advanced Cyberinfrastructure (OAC) [0904534] Funding Source: National Science Foundation
Division Of Computer and Network Systems
Direct For Computer & Info Scie & Enginr [0958512] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

3D FFT is computationally intensive and at the same time requires global or collective communication patterns. The efficient implementation of FFT on extreme scale computers is one of the grand challenges in scientific computing. On parallel computers with a distributed memory, different domain decompositions are possible to scale 3D FFT computation. In this paper, we argue that 2D domain decomposition is likely the best approach in terms of using a very large number of processors with reasonable data communication overhead. Specifically, we extend the data communication approach of Dmitruk et al. (2001) [21] previously used for 1D domain decomposition, to 2D domain decomposition. A thorough quantitative analysis of the code performance is undertaken for different problem sizes and numbers of processors, including scalability, load balance, dependence on subdomain configuration (i.e., different numbers of subdomain in the two decomposed directions for a fixed total number of subdomains). We show that our proposed approach is faster than the existing attempts on 2D-decomposition of 3D FFTs by Pekurovsky (2007) [23] (p3dfft), Takahashi (2009) [24], and Li and Laizet (2010) [25] (2decomp.org) especially for the case of large problem size and large number of processors (our strategy is 28% faster than Pekurovski's scheme, its closest competitor). We also show theoretically that our scheme performs better than the approach by Nelson et al. (1993) [22] up to a certain number of processors beyond which latency becomes and issue. We demonstrate that the speedup scales with the number of processors almost linearly before it saturates. The execution time on different processors differ by less than 5%, showing an excellent load balance. We further partitioned the execution time into computation, communication, and data copying related to the transpose operation, to understand how the relative percentage of the communication time increases with the number of processors. Finally, a theoretical complexity analysis is carried out to predict the scalability and its saturation. The complexity analysis indicates that the 2D domain decomposition will make it feasible to run a large 3D FFT on scalable computers with several hundred thousands processors. (C) 2012 Elsevier B.V. All rights reserved.

Parallel implementation and scalability analysis of 3D Fast Fourier Transform using 2D domain decomposition

期刊

PARALLEL COMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Parallel implementation and scalability analysis of 3D Fast Fourier Transform using 2D domain decomposition

期刊

PARALLEL COMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文