4.5 Article

Streaming techniques: revealing the natural concurrency of the lattice Boltzmann method

期刊

JOURNAL OF SUPERCOMPUTING
卷 77, 期 10, 页码 11911-11929

出版社

SPRINGER
DOI: 10.1007/s11227-021-03762-z

关键词

LRnLA algorithms; Lattice-Boltzmann method; Parallel computing; CUDA GPU

资金

  1. Russian Science Foundation [18-71-10004]
  2. Russian Science Foundation [18-71-10004] Funding Source: Russian Science Foundation

向作者/读者索取更多资源

This paper explores data flow arrangement possibilities, defines a new propagation scheme, and implements it as GPU program code using a locally recursive non-locally asynchronous algorithm construction method, achieving performance up to 10 GLUps on a single nVidia GeForce RTX 3090 GPU.
The LBM produces stencil numerical schemes which fall into the memory-bound domain. Therefore the performance may be multiplied if the arithmetic intensity is increased. In this paper, the data flow arrangement possibilities at the streaming step are explored while aiming for the development of the most efficient algorithms and implementations of the LBM schemes. The locally recursive non-locally asynchronous algorithm construction method is used for the purpose. This method is based on the analysis of the dependency graph of the task in the dD1T Minkowsky space. The schemes of well-known propagation patterns are illustrated and analyzed. With the knowledge of their advantages and drawbacks, the new propagation scheme is defined. The best propagation scheme which is constructed with this method is implemented as the program code for GPU. The description of the code and the performance results are provided. The obtained performance is up to 10 GLUps on a single nVidia GeForce RTX 3090 GPU.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据