3.9 Article

Esoteric Pull and Esoteric Push: Two Simple In-Place Streaming Schemes for the Lattice Boltzmann Method on GPUs

Journal

COMPUTATION
Volume 10, Issue 6, Pages -

Publisher

MDPI
DOI: 10.3390/computation10060092

Keywords

lattice Boltzmann method; GPU; in-place streaming; swap algorithm; Esoteric Twist; memory; memory bandwidth; Volume-of-Fluid; FluidX3D; OpenCL

Funding

  1. Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) [391977956-SFB 1357]

Ask authors/readers for more resources

This study introduces two novel thread-safe in-place streaming schemes for the lattice Boltzmann method on GPUs. These schemes reduce memory demand by only requiring one copy of the density distribution functions. They improve performance through optimized memory coalescence and are compatible with different devices and automatic code generation.
I present two novel thread-safe in-place streaming schemes for the lattice Boltzmann method (LBM) on graphics processing units (GPUs), termed Esoteric Pull and Esoteric Push, that result in the LBM only requiring one copy of the density distribution functions (DDFs) instead of two, greatly reducing memory demand. These build upon the idea of the existing Esoteric Twist scheme, to stream half of the DDFs at the end of one stream-collide kernel and the remaining half at the beginning of the next, and offer the same beneficial properties over the AA-Pattern scheme-reduced memory bandwidth due to implicit bounce-back boundaries and the possibility of swapping pointers between even and odd time steps. However, the streaming directions are chosen in a way that allows the algorithm to be implemented in about one tenth the amount of code, as two simple loops, and is compatible with all velocity sets and suitable for automatic code-generation. The performance of the new streaming schemes is slightly increased over Esoteric Twist due to better memory coalescence. Benchmarks across a large variety of GPUs and CPUs show that for most dedicated GPUs, performance differs only insignificantly from the One-Step Pull scheme; however, for integrated GPUs and CPUs, performance is significantly improved. The two proposed algorithms greatly facilitate modifying existing code to in-place streaming, even with extensions already in place, such as demonstrated here for the Free Surface LBM implementation FluidX3D. Their simplicity, together with their ideal performance characteristics, may enable more widespread adoption of in-place streaming across LBM GPU codes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.9
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available