4.4 Article

Multi-GPU performance optimization of a computational fluid dynamics code using OpenACC

Related references

Note: Only part of the references are listed.
Article Computer Science, Interdisciplinary Applications

MPI plus OpenACC: Accelerating radiation transport mini-application, minisweep, on heterogeneous systems

Robert Searles et al.

COMPUTER PHYSICS COMMUNICATIONS (2019)

Proceedings Paper Computer Science, Theory & Methods

Optimizing Computation-Communication Overlap in Asynchronous Task-Based Programs

Emilio Castillo et al.

INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2019) (2019)

Article Computer Science, Hardware & Architecture

Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations

Jing Gong et al.

JOURNAL OF SUPERCOMPUTING (2016)

Proceedings Paper Computer Science, Theory & Methods

MPI Overlap: Benchmark and Analysis

Alexandre Denis et al.

PROCEEDINGS 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - ICPP 2016 (2016)

Article Computer Science, Interdisciplinary Applications

Directive-based GPU programming for computational fluid dynamics

Brent P. Pickering et al.

COMPUTERS & FLUIDS (2015)

Proceedings Paper Computer Science, Theory & Methods

Improving Concurrency and Asynchrony in Multithreaded MPI Applications using Software Offloading

Karthikeyan Vaidyanathan et al.

PROCEEDINGS OF SC15: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (2015)

Proceedings Paper Computer Science, Software Engineering

MPI plus ULT: Overlapping Communication and Computation with User-Level Threads

Huiwei Lu et al.

2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS) (2015)

Proceedings Paper Computer Science, Software Engineering

Evaluating Performance Portability of OpenACC

Amit Sabne et al.

LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING (LCPC 2014) (2015)

Proceedings Paper Computer Science, Theory & Methods

CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application11

Tetsuya Hoshino et al.

PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013) (2013)

Article Computer Science, Theory & Methods

Partitioning strategies for structured multiblock grids

J Rantakokko

PARALLEL COMPUTING (2000)