☆ 3.8 Proceedings Paper

Accelerating unstructured-grid CFD algorithms on NVIDIA and AMD GPUs

PROCEEDINGS OF IA3 2021: 2021 IEEE/ACM 11TH WORKSHOP ON IRREGULAR APPLICATIONS: ARCHITECTURES AND ALGORITHMS (2021)

期刊

PROCEEDINGS OF IA3 2021: 2021 IEEE/ACM 11TH WORKSHOP ON IRREGULAR APPLICATIONS: ARCHITECTURES AND ALGORITHMS

卷 -, 期 -, 页码 19-26

出版社

IEEE COMPUTER SOC

DOI: 10.1109/IA354616.2021.00010

关键词

Unstructured grid CFD; GPU Performance; Performance Portability; AMD ROCm; Atomic Update

类别

Computer Science, Hardware & Architecture Computer Science, Software Engineering Computer Science, Theory & Methods

资金

NASA Langley Research Center CIF/IRAD program
NASA Transformational Tools and Technologies (TTT) Project of the Transformative Aeronautics Concepts Program under the Aeronautics Research Mission Directorate
National Institute of Aerospace [NNLO9AAOOA]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Optimization methods were studied to improve GPU efficiency, with a focus on AMD MI100 GPU and some success on NVIDIA V100 and A100. Techniques combining register shuffling and on-chip shared memory were used to enhance performance.

Computational performance of the FUN3D unstructured-grid computational fluid dynamics (CFD) application on GPUs is highly dependent on the efficiency of floating-point atomic updates needed to support the irregular cell-, edge-, and node-based data access patterns in massively parallel GPU environments. We examine several optimization methods to improve GPU efficiency of performance-critical kernels that are dominated by atomic update costs on NVIDIA V100/A100 and AMD CDNA MI100 GPUs. Optimization on the AMD MI100 GPU was of primary interest since similar hardware will be used in the upcoming Frontier supercomputer. Techniques combining register shuffling and on-chip shared memory were used to transpose and/or aggregate results amongst collaborating GPU threads before atomically updating global memory. These techniques, along with algorithmic optimizations to reduce the update frequency, reduced the run-time of select kernels on the MI100 GPU by a factor of between 2.5 and 6.0 over atomically updating global memory directly. Performance impact on the NVIDIA GPUs was mixed with the performance of the V100 often degraded when using register-based aggregation/transposition techniques while the A100 generally benefited from these methods, though to a lesser extent than measured on the MI100 GPU. Overall, both V100 and A100 GPUs outperformed the MI100 GPU on kernels dominated by double-precision atomic updates; however, the techniques demonstrated here reduced the performance gap and improved the MI100 performance.

Accelerating unstructured-grid CFD algorithms on NVIDIA and AMD GPUs

期刊

PROCEEDINGS OF IA3 2021: 2021 IEEE/ACM 11TH WORKSHOP ON IRREGULAR APPLICATIONS: ARCHITECTURES AND ALGORITHMS

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Accelerating unstructured-grid CFD algorithms on NVIDIA and AMD GPUs

期刊

PROCEEDINGS OF IA3 2021: 2021 IEEE/ACM 11TH WORKSHOP ON IRREGULAR APPLICATIONS: ARCHITECTURES AND ALGORITHMS

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文