期刊
2021 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE AND STORAGE (NAS)
卷 -, 期 -, 页码 124-131出版社
IEEE
DOI: 10.1109/NAS51552.2021.9605411
关键词
GPGPU; Cache management; Data locality; Syntax tree
类别
资金
- National Science Foundation [NSF - 1907401, 1815643]
- Govt of India SPARC [P712]
- Direct For Computer & Info Scie & Enginr [1815643] Funding Source: National Science Foundation
- Division of Computing and Communication Foundations [1815643] Funding Source: National Science Foundation
This paper proposes a thread block-centric locality analysis method to efficiently use data caches and address memory bottleneck issues in GPGPUs. By conducting detailed JIT compilation analysis and deriving mappings between threads and data indices at kernel-launch-time, smarter decisions and data partition management can be achieved.
Exploiting data locality in GPGPUs is critical for efficiently using the smaller data caches and handling the memory bottleneck problem. This paper proposes a thread block-centric locality analysis, which identifies the locality among the thread blocks (TBs) in terms of a number of common data references. In LocalityGuru, we seek to employ a detailed just-in-time (JIT) compilation analysis of the static memory accesses in the source code and derive the mapping between the threads and data indices at kernel-launch-time. Our locality analysis technique can be employed at multiple granularities such as threads, warps, and thread blocks in a GPU Kernel. This information can be leveraged to help make smarter decisions for locality-aware data-partition, memory page data placement, cache management, and scheduling in single-GPU and multi-GPU systems. The results of the LocalityGuru PTX analyzer are then validated by comparing with the Locality graph obtained through profiling. Since the entire analysis is carried out by the compiler before the kernel launch time, it does not introduce any timing overhead to the kernel execution time.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据