☆ 4.6 Article

Acceleration of a Production-Level Unstructured Grid Finite Volume CFD Code on GPU

APPLIED SCIENCES-BASEL (2023)

Journal

APPLIED SCIENCES-BASEL

Volume 13, Issue 10, Pages -

Publisher

MDPI

DOI: 10.3390/app13106193

Keywords

unstructured-grid; CFD; shared memory parallelization; GPU; data racing

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Due to the complexity of unstructured CFD computing, parallelizing the finite volume method algorithms in shared memory for many-core GPUs is a significant challenge. Three parallel programming strategies and several data locality optimization methods were implemented and evaluated. The proposed methods achieved improved memory access performance and significant acceleration effects compared to CPU versions.

Due to the complex topological relationship, poor data locality, and data racing problems in unstructured CFD computing, how to parallelize the finite volume method algorithms in shared memory to efficiently explore the hardware capabilities of many-core GPUs has become a significant challenge. Based on a production-level unstructured CFD software, three shared memory parallel programming strategies, atomic operation, colouring, and reduction were designed and implemented by deeply analysing its computing behaviour and memory access mode. Several data locality optimization methods-grid reordering, loop fusion, and multi-level memory access-were proposed. Aimed at the sequential attribute of LU-SGS solution, two methods based on cell colouring and hyperplane were implemented. All the parallel methods and optimization techniques implemented were comprehensively analysed and evaluated by the three-dimensional grid of the M6 wing and CHN-T1 aeroplane. The results show that using the Cuthill-McKee grid renumbering and loop fusion optimization techniques can improve memory access performance by 10%. The proposed reduction strategy, combined with multi-level memory access optimization, has a significant acceleration effect, speeding up the hot spot subroutine with data races three times. Compared with the serial CPU version, the overall speed-up of the GPU codes can reach 127. Compared with the parallel CPU version, the overall speed-up of the GPU codes can achieve more than thirty times the result in the same Message Passing Interface (MPI) ranks.

Acceleration of a Production-Level Unstructured Grid Finite Volume CFD Code on GPU

Journal

APPLIED SCIENCES-BASEL

Publisher

MDPI

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Acceleration of a Production-Level Unstructured Grid Finite Volume CFD Code on GPU

Journal

APPLIED SCIENCES-BASEL

Publisher

MDPI

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper