4.7 Article

Graphics processing unit accelerated phase field dislocation dynamics: Application to bi-metallic interfaces

Journal

ADVANCES IN ENGINEERING SOFTWARE
Volume 115, Issue -, Pages 248-267

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.advengsoft.2017.09.010

Keywords

Phase field dislocation dynamics; Graphics processing unit (GPU); Compute unified device architecture (CUDA); OPENACC; Spectral methods

Funding

  1. U.S. National Science Foundation (NSF) [CMMI-1650641]
  2. Los Alamos National Laboratory Directed Research and Development (LDRD) [20160156ER]
  3. NSF [CMMI-1728224]

Ask authors/readers for more resources

We present the first high-performance computing implementation of the meso-scale phase field dislocation dynamics (PFDD) model on a graphics processing unit (GPU)-based platform. The implementation takes advantage of the portable OpenACC standard directive pragmas along with Nvidia's compute unified device architecture (CUDA) fast Fourier transform (FFT) library called CUFFT to execute the FFT computations within the PFDD formulation on the same GPU platform. The overall implementation is termed ACCPFDD-CUFFT. The package is entirely performance portable due to the use of OPENACC-CUDA inter-operability, in which calls to CUDA functions are replaced with the OPENACC data regions for a host central processing unit (CPU) and device (GPU). A comprehensive benchmark study has been conducted, which compares a number of FFT routines, the Numerical Recipes FFT (FOURN), Fastest Fourier Transform in the West (FFTW), and the CUFFT. The last one exploits the advantages of the GPU hardware for FFT calculations. The novel ACCPFDD-CUFFT implementation is verified using the analytical solutions for the stress field around an infinite edge dislocation and subsequently applied to simulate the interaction and motion of dislocations through a bi-phase copper-nickel (Cu-Ni) interface. It is demonstrated that the ACCPFDD-CUFFT implementation on a single TESLA K80 GPU offers a 27.6X speedup relative to the serial version and a 5X speedup relative to the 22-multicore Intel Xeon CPU E5-2699 v4 @ 2.20 GHz version of the code.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available