4.2 Article

Performance Analysis of Speculative Parallel Adaptive Local Timestepping for Conservation Laws

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3545996

关键词

Local timestepping; parallel discrete event simulation; Timewarp; shallow water equations; conservation laws

资金

  1. U.S. Department of Energy [DE-AC0205CH11231]
  2. National Science Foundation [NSF 1854986]

向作者/读者索取更多资源

This article introduces an adaptive local timestepping algorithm using an optimistic parallel discrete event simulation. It also presents waiting heuristics and a semi-static load balancing scheme. The algorithm can effectively simulate conservation laws, improving work efficiency and performance.
Stable simulation of conservation laws, such as those used to model fluid dynamics and plasma physics applications, requires the satisfaction of the so-called Courant-Friedrichs-Lewy condition. By allowing regions of the mesh to advance with different timesteps that locally satisfy this stability constraint, significant work reduction can be attained when compared to a time integration scheme using a single timestep size. However, parallelizing this algorithm presents considerable difficulty. Since the stability condition depends on the state of the system, dependencies become dynamic and potentially non-local. In this article, we present an adaptive local timestepping algorithm using an optimistic (Timewarp-based) parallel discrete event simulation. We introduce waiting heuristics to limit misspeculation and a semi-static load balancing scheme to eliminate load imbalance as parts of the mesh require finer or coarser timesteps. Last, we outline an interface for separating the physics of the specific conservation law from the temporal integration allowing for productive adoption of our proposed algorithm. We present a misspeculation study for three conservation laws, demonstrating both the productivity of the local timestepping API, for which 74% of the lines of code are reused across different conservation laws, and the robustness of the waiting heuristics-at most 1.5% of element updates are rolled back. Our performance studies demonstrate up to a 2.8x speedup versus a baseline unoptimized local timestepping approach, a 4x improvement in per-node throughput compared to an MPI parallelization of synchronous timestepping, and scalability up to 3,072 cores on NERSC's Cori Haswell partition.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据