☆ 4.7 Article

cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2020)

期刊

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

卷 31, 期 4, 页码 766-778

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TPDS.2019.2944602

关键词

Kernel; Graphics processing units; Benchmark testing; Hardware; Scheduling; Analytical models; Kernel; scheduling; concurrent kernel execution; stream; resource management

类别

Computer Science, Theory & Methods Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

While GPUs are meantime omnipresent for many scientific and technical computations, they still continue to evolve as processors. An important recent feature is the ability to execute multiple kernels concurrently via queue streams. However, experiments show that different parameters including the behavior of kernels, the order of kernel launches and other execution configurations, e.g., the number of concurrent thread blocks, may result in different execution time for concurrent kernel execution. Since kernels may have different resource requirements, they can be classified into different classes, which are traditionally assumed as either memory-bound or compute-bound. However, a kernel may belong to the different classes on different hardware according to the hardware resources. In this paper, the definition of kernel mix intensity is introduced. Based on this, a scheduling framework called concurrent CUDA (cCUDA) is proposed to co-schedule the concurrent kernels more efficiently. It first profiles and ranks kernels with different execution behaviors and then takes the kernel resource requirements into account to partition thread blocks of different kernels and overlap them to better utilize the GPU resources. Experimental results on real hardware demonstrate performance improvement in terms of execution time of up to 1.86x, and an average speedup of 1.28x for a wide range of kernels. cCUDA is available at https://github.com/kshekofteh/cCUDA.

cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs

期刊

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs

期刊

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文