☆ 4.4 Article

Parallelizing and optimizing large-scale 3D multi-phase flow simulations on the Tianhe-2 supercomputer

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE (2016)

Journal

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE

Volume 28, Issue 5, Pages 1678-1692

Publisher

WILEY

DOI: 10.1002/cpe.3717

Keywords

heterogeneous system; intel xeon phi; Tianhe-2; multi-phase flow; LBM

Funding

Basic Research Program of National University of Defense Technology [ZDYYJCYJ20140101]
Open Research Program of China State Key Laboratory of Aerodynamics [SKLA20140104]
IAPCM Application Research Program for High Performance Computing [R2015-0402-01]
National Science Foundation of China [11502296]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The lattice Boltzmann method (LBM) is a widely used computational fluid dynamics method for flow problems with complex geometries and various boundary conditions. Large-scale LBM simulations with increasing resolution and extending temporal range require massive (HPC) resources, thus motivating us to port it onto modern many-core heterogeneous supercomputers like Tianhe-2. Although many-core accelerators such as graphics processing unit and Intel MIC have a dramatic advantage of floating-point performance and power efficiency over CPUs, they also pose a tough challenge to parallelize and optimize computational fluid dynamics codes on large-scale heterogeneous system. In this paper, we parallelize and optimize the open source 3D multi-phase LBM code openlbmflow on the Intel Xeon Phi (MIC) accelerated Tianhe-2 supercomputer using a hybrid and heterogeneous MPI+OpenMP+Offload+single instruction, mulitple data (SIMD) programming model. With cache blocking and SIMD-friendly data structure transformation, we dramatically improve the SIMD and cache efficiency for the single-thread performance on both CPU and Phi, achieving a speedup of 7.9X and 8.8X, respectively, compared with the baseline code. To collaborate CPUs and Phi processors efficiently, we propose a load-balance scheme to distribute workloads among intra-node two CPUs and three Phi processors and use an asynchronous model to overlap the collaborative computation and communication as far as possible. The collaborative approach with two CPUs and three Phi processors improves the performance by around 3.2X compared with the CPU-only approach. Scalability tests show that openlbmflow can achieve a parallel efficiency of about 60% on 2048 nodes, with about 400K cores in total. To the best of our knowledge, this is the largest scale CPU-MIC collaborative LBM simulation for 3D multi-phase flow problems. Copyright (c) 2015 John Wiley & Sons, Ltd.

Parallelizing and optimizing large-scale 3D multi-phase flow simulations on the Tianhe-2 supercomputer

Journal

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE

Publisher

WILEY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Parallelizing and optimizing large-scale 3D multi-phase flow simulations on the Tianhe-2 supercomputer

Journal

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE

Publisher

WILEY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper