☆ 4.3 Article

PAVER: Locality Graph-Based Thread Block Scheduling for GPUs

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION (2021)

期刊

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION

卷 18, 期 3, 页码 -

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3451164

关键词

GPGPU; thread block; dependency graph; locality

类别

Computer Science, Hardware & Architecture Computer Science, Theory & Methods

资金

NSF [CCF-1815643, CCF-1907401]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Research shows that considering data access locality in thread scheduling in GPUs is crucial to reduce cache contention problems. A scheduler named PAVER optimizes thread scheduling by using a graph-theoretic approach and data sharing among thread blocks, significantly improving performance.

The massive parallelism present in GPUs comes at the cost of reduced L1 and L2 cache sizes per thread, leading to serious cache contention problems such as thrashing. Hence, the data access locality of an application should be considered during thread scheduling to improve execution time and energy consumption. Recent works have tried to use the locality behavior of regular and structured applications in thread scheduling, but the difficult case of irregular and unstructured parallel applications remains to be explored. We present PAVER, a Priority-Aware Vertex schedulER, which takes a graph-theoretic approach toward thread scheduling. We analyze the cache locality behavior among thread blocks (TBs) through a just-in-time compilation, and represent the problem using a graph representing the TBs and the locality among them. This graph is then partitioned to TB groups that display maximum data sharing, which are then assigned to the same streaming multiprocessor by the locality-aware TB scheduler. Through exhaustive simulation in Fermi, Pascal, and Volta architectures using a number of scheduling techniques, we show that PAVER reduces L2 accesses by 43.3%, 48.5%, and 40.21% and increases the average performance benefit by 29%, 49.1%, and 41.2% for the benchmarks with high inter-TB locality.

PAVER: Locality Graph-Based Thread Block Scheduling for GPUs

期刊

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

PAVER: Locality Graph-Based Thread Block Scheduling for GPUs

期刊

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文