4.5 Article

Directive-based GPU programming for computational fluid dynamics

Journal

COMPUTERS & FLUIDS
Volume 114, Issue -, Pages 242-253

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.compfluid.2015.03.008

Keywords

Graphics processing unit (GPU); Directive-based programming; OpenACC; Fortran; Finite-difference method

Funding

  1. Air Force Office of Scientific Research (AFOSR) Basic Research Initiative in the Computational Mathematics program

Ask authors/readers for more resources

Directive-based programming of graphics processing units (CPUs) has recently appeared as a.viable alter. native to using specialized low-level languages such as CUDA C and OpenCL for general-purpose GPU programming. This technique, which uses directive or pragma statements to annotate source codes written in traditional high-level languages, is designed to permit a unified code base to serve multiple computational platforms. In this work we analyze the popular OpenACC programming standard, as implemented by the PGI compiler suite, in order to evaluate its utility and performance potential in computational fluid dynamics (CFD) applications. We examine the process of applying the OpenACC Fortran API to a test CFD code that serves as a proxy for a full-scale research code developed at Virginia Tech; this test code is used to asses the performance improvements attainable for our CFD algorithm on common GPU platforms, as well as to determine the modifications that must be made to the original source code in order to run efficiently on the GPU. Performance is measured on several recent GPU architectures from NVIDIA and AMD (using both double and single precision arithmetic) and the accelerator code is bench-marked against a multithreaded CPU version constructed from the same Fortran source code using OpenMP directives. A single NVIDIA Kepler CPU card is found to perform approximately 20x faster than a single CPU core and more than 2x faster than a 16-core Xeon server. An analysis of optimization techniques for OpenACC reveals cases in which manual intervention by the programmer can improve accelerator performance by up to 30% over the default compiler heuristics, although these optimizations are relevant only for specific platforms. Additionally, the use of multiple accelerators with OpenACC is investigated, including an experimental high-level interface for multi-GPU programming that automates scheduling tasks across multiple devices. While the overall performance of the OpenACC code is found to be satisfactory, we also observe some significant limitations and restrictions imposed by the OpenACC API regarding certain useful features of modern Fortran (2003/8); these are sufficient for us to conclude that it would not be practical to apply OpenACC to our full research code at this time due to the amount of refactoring required. (C) 2015 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available