4.7 Article

A decomposition method with minimum communication amount for parallelization of multi-dimensional FFTs

Journal

COMPUTER PHYSICS COMMUNICATIONS
Volume 185, Issue 1, Pages 153-164

Publisher

ELSEVIER
DOI: 10.1016/j.cpc.2013.08.028

Keywords

Fast Fourier transform; Multi-dimensional FFTs; Domain decomposition for FFTs; Domain decomposition method; Parallel FFTs

Funding

  1. Strategic Programs for Innovative Research (SPIRE)
  2. MEXT
  3. Computational Materials Science Initiative (CMSI)
  4. MEXT, Japan
  5. Grants-in-Aid for Scientific Research [22104005] Funding Source: KAKEN

Ask authors/readers for more resources

The fast Fourier transform (FFT) is undoubtedly an essential primitive that has been applied in various fields of science and engineering. In this paper, we present a decomposition method for the parallelization of multi-dimensional FFTs with the smallest communication amounts for all ranges of the number of processes compared to previously proposed methods. This is achieved by two distinguishing features: adaptive decomposition and transpose order awareness. In the proposed method, the FFT data is decomposed based on a row-wise basis that maps the multi-dimensional data into one-dimensional data, and translates the corresponding coordinates from multi-dimensions into one dimension so that the one-dimensional data can be divided and allocated equally to the processes using a block distribution. As a result and different from previous works that have the dimensions of decomposition pre-defined, our method can adaptively decompose the FFT data on the lowest possible dimensions depending on the number of processes. In addition, this row-wise decomposition provides plenty of alternatives in data transpose, and different transpose order results in different amounts of communication. We identify the best transpose orders with the smallest communication amounts for the 3-D, 4-D, and 5-D FFTs by analyzing all possible cases. We also develop a general parallel software package for the most popular 3-D FFT based on our method using the 2-D domain decomposition. Numerical results show good performance and scaling properties of our implementation in comparison with other parallel packages. Given both communication efficiency and scalability, our method is promising in the development of highly efficient parallel packages for the FFT. (C) 2013 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available