4.6 Article

A thread-block-wise computational framework for large-scale hierarchical continuum-discrete modeling of granular media

Journal

Publisher

WILEY
DOI: 10.1002/nme.6549

Keywords

continuum-discrete coupling; DEM; granular media; MPM; multiscale modeling; parallel computing

Funding

  1. Hong Kong Scholars Program [XJ2018049]
  2. National Natural Science Foundation of China [11972030, 51679207, 51909095]
  3. Research Grants Council of Hong Kong [16205418, C6012-15G, T22-603/15N]

Ask authors/readers for more resources

The article introduces a novel parallel computing framework for large-scale and multiscale simulations of granular media, utilizing RVE parallelism and GPU-specific techniques. Benchmark tests show that GoDEM can achieve a speedup of approximately 350 compared to a single-CPU-core code.
This article presents a novel, scalable parallel computing framework for large-scale and multiscale simulations of granular media. Key to the new framework is an innovative thread-block-wise representative volume element (RVE) parallelism, inspired by the resemblance between a typical multiscale computational hierarchy and the hierarchical thread structure of graphics processing units (GPUs). To solve a hierarchical multiscale problem, all computation in an RVE is assigned a single block of threads so that the RVE runs entirely on a GPU to avoid frequent data exchange with the host CPU. The thread blocks can meanwhile run in an asynchronization mode, which implicitly guarantees the independence of inter-RVE computation as featured by the hierarchical multiscale structure. The parallel computing algorithms are formulated and implemented in an in-house code,GoDEM, involving the GPU-specific techniques such as coalesced access, shared memory utilization, and unified memory implementation. Benchmark and performance tests are conducted against an open-source CPU-based DEM code under three typical loading conditions. The performance ofGoDEMis examined with varying thread-block size and register pressure of the GPU, and RVE number. It reveals that increasing GPU occupancy by decreasing register pressure results in a significant degradation rather than improvement in performance. We further demonstrate that the proposed GPU parallelism framework may achieve a saturated speedup of approximately 350 compared with the single-CPU-core code. As a demonstration on its application for multiscale modeling of granular media, the material point method is coupled with the new framework powered DEM to simulate a typical engineering-scale problem involving tens of millions of total particles having to be handled. It demonstrates that a speedup of approximately 91 can be achieved by using the proposed framework, compared with the performance of a similar CPU program running on a cluster node of 44 parallel threads. The study offers a viable future solution to large-scale and multiscale modeling of granular media.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available