期刊
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION
卷 18, 期 3, 页码 -出版社
ASSOC COMPUTING MACHINERY
DOI: 10.1145/3451164
关键词
GPGPU; thread block; dependency graph; locality
资金
- NSF [CCF-1815643, CCF-1907401]
Research shows that considering data access locality in thread scheduling in GPUs is crucial to reduce cache contention problems. A scheduler named PAVER optimizes thread scheduling by using a graph-theoretic approach and data sharing among thread blocks, significantly improving performance.
The massive parallelism present in GPUs comes at the cost of reduced L1 and L2 cache sizes per thread, leading to serious cache contention problems such as thrashing. Hence, the data access locality of an application should be considered during thread scheduling to improve execution time and energy consumption. Recent works have tried to use the locality behavior of regular and structured applications in thread scheduling, but the difficult case of irregular and unstructured parallel applications remains to be explored. We present PAVER, a Priority-Aware Vertex schedulER, which takes a graph-theoretic approach toward thread scheduling. We analyze the cache locality behavior among thread blocks (TBs) through a just-in-time compilation, and represent the problem using a graph representing the TBs and the locality among them. This graph is then partitioned to TB groups that display maximum data sharing, which are then assigned to the same streaming multiprocessor by the locality-aware TB scheduler. Through exhaustive simulation in Fermi, Pascal, and Volta architectures using a number of scheduling techniques, we show that PAVER reduces L2 accesses by 43.3%, 48.5%, and 40.21% and increases the average performance benefit by 29%, 49.1%, and 41.2% for the benchmarks with high inter-TB locality.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据