4.3 Article

PAVER: Locality Graph-Based Thread Block Scheduling for GPUs

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3451164

关键词

GPGPU; thread block; dependency graph; locality

资金

  1. NSF [CCF-1815643, CCF-1907401]

向作者/读者索取更多资源

Research shows that considering data access locality in thread scheduling in GPUs is crucial to reduce cache contention problems. A scheduler named PAVER optimizes thread scheduling by using a graph-theoretic approach and data sharing among thread blocks, significantly improving performance.
The massive parallelism present in GPUs comes at the cost of reduced L1 and L2 cache sizes per thread, leading to serious cache contention problems such as thrashing. Hence, the data access locality of an application should be considered during thread scheduling to improve execution time and energy consumption. Recent works have tried to use the locality behavior of regular and structured applications in thread scheduling, but the difficult case of irregular and unstructured parallel applications remains to be explored. We present PAVER, a Priority-Aware Vertex schedulER, which takes a graph-theoretic approach toward thread scheduling. We analyze the cache locality behavior among thread blocks (TBs) through a just-in-time compilation, and represent the problem using a graph representing the TBs and the locality among them. This graph is then partitioned to TB groups that display maximum data sharing, which are then assigned to the same streaming multiprocessor by the locality-aware TB scheduler. Through exhaustive simulation in Fermi, Pascal, and Volta architectures using a number of scheduling techniques, we show that PAVER reduces L2 accesses by 43.3%, 48.5%, and 40.21% and increases the average performance benefit by 29%, 49.1%, and 41.2% for the benchmarks with high inter-TB locality.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据