☆ 4.7 Article

A Massively Parallel and Scalable Multi-GPU Material Point Method

ACM TRANSACTIONS ON GRAPHICS (2020)

期刊

ACM TRANSACTIONS ON GRAPHICS

卷 39, 期 4, 页码 -

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3386569.3392442

关键词

Numerical methods; parallel computing; GPU

类别

Computer Science, Software Engineering

资金

National Key R&D Program of China [2017YFB1002703]
NSFC [61972341, 61972342, 61732015, 61572423]
NSF CAREER [IIS-1943199, CCF-1813624]
DOE ORNL [4000171342]
NVIDIA GPU grants
ONR MURI [N00014-16-1-2007]
DARPA XAI [N66001-17-2-4029]
ONR [N00014-19-1-2153]
Exascale Computing Project [17-SC-20-SC]
U.S. Department of Energy [DE-AC05-00OR22725]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Harnessing the power of modern multi-GPU architectures, we present a massively parallel simulation system based on the Material Point Method (MPM) for simulating physical behaviors of materials undergoing complex topological changes, self-collision, and large deformations. Our system makes three critical contributions. First, we introduce a new particle data structure that promotes coalesced memory access patterns on the GPU and eliminates the need for complex atomic operations on the memory hierarchy when writing particle data to the grid. Second, we propose a kernel fusion approach using a new Grid-to-Particles-to-Grid (G2P2G) scheme, which efficiently reduces GPU kernel launches, improves latency, and significantly reduces the amount of global memory needed to store particle data. Finally, we introduce optimized algorithmic designs that allow for efficient sparse grids in a shared memory context, enabling us to best utilize modern multi-GPU computational platforms for hybrid Lagrangian-Eulerian computational patterns. We demonstrate the effectiveness of our method with extensive benchmarks, evaluations, and dynamic simulations with elastoplasticity, granular media, and fluid dynamics. In comparisons against an open-source and heavily optimized CPU-based MPM codebase [Fang et al. 2019] on an elastic sphere colliding scene with particle counts ranging from 5 to 40 million, our GPU MPM achieves over 100X per-time-step speedup on a workstation with an Intel 8086K CPU and a single Quadro P6000 GPU, exposing exciting possibilities for future MPM simulations in computer graphics and computational science. Moreover, compared to the state-of-the-art GPU MPM method [Hu et al. 2019a], we not only achieve 2x acceleration on a single GPU but our kernel fusion strategy and Array-of-Structs-of-Array (AoSoA) data structure design also generalizes to multi-GPU systems. Our multi-GPU MPM exhibits near-perfect weak and strong scaling with 4 GPUs, enabling performant and large-scale simulations on a 10243 grid with close to 100 million particles with less than 4 minutes per frame on a single 4-GPU workstation and 134 million particles with less than 1 minute per frame on an 8-GPU workstation.

A Massively Parallel and Scalable Multi-GPU Material Point Method

期刊

ACM TRANSACTIONS ON GRAPHICS

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Massively Parallel and Scalable Multi-GPU Material Point Method

期刊

ACM TRANSACTIONS ON GRAPHICS

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文