☆ 4.6 Article

Efficient parallel implementation of crowd simulation using a hybrid CPU plus GPU high performance computing system

SIMULATION MODELLING PRACTICE AND THEORY (2023)

期刊

SIMULATION MODELLING PRACTICE AND THEORY

卷 123, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.simpat.2022.102691

关键词

Crowd simulation; Parallelization; Hybrid CPU plus GPU system; High performance computing

类别

Computer Science, Interdisciplinary Applications Computer Science, Software Engineering

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this paper, we introduce a modern and efficient parallel OpenMP+CUDA implementation for hybrid CPU+GPU systems, and demonstrate its superior performance compared to CPU-only and GPU-only implementations for various problem sizes. We investigate the impact of tile sizes and CPU-GPU load balancing settings on performance, and analyze the execution time based on the number of agents and CUDA streams used. The design and implementation of the algorithm, including CPU computational threads, GPU management threads, task assignment, and memory utilization, are discussed for maximizing performance.

In the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU -only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU-GPU load balancing settings shall be preferred for various domain sizes among CPUs and GPUs of a high performance system with 2 Intel Xeon Silver multicore CPUs and 8 NVIDIA Quadro RTX 5000 GPUs. We then present how execution time depends on the number of agents as well as the number of CUDA streams used for parallel execution of several CUDA kernels. We discuss the design and implementation of an algorithm with CPU computational threads, GPU management threads, assignment of particular tasks to threads as well as usage of pinned memory and CUDA shared memory for maximizing performance.

Efficient parallel implementation of crowd simulation using a hybrid CPU plus GPU high performance computing system

期刊

SIMULATION MODELLING PRACTICE AND THEORY

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Efficient parallel implementation of crowd simulation using a hybrid CPU plus GPU high performance computing system

期刊

SIMULATION MODELLING PRACTICE AND THEORY

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文