☆ 3.8 Proceedings Paper

LocalityGuru: A PTX Analyzer for Extracting Thread Block-level Locality in GPGPUs

2021 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE AND STORAGE (NAS) (2021)

期刊

2021 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE AND STORAGE (NAS)

卷 -, 期 -, 页码 124-131

出版社

IEEE

DOI: 10.1109/NAS51552.2021.9605411

关键词

GPGPU; Cache management; Data locality; Syntax tree

类别

Computer Science, Hardware & Architecture Computer Science, Information Systems Computer Science, Theory & Methods Telecommunications

资金

National Science Foundation [NSF - 1907401, 1815643]
Govt of India SPARC [P712]
Direct For Computer & Info Scie & Enginr [1815643] Funding Source: National Science Foundation
Division of Computing and Communication Foundations [1815643] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a thread block-centric locality analysis method to efficiently use data caches and address memory bottleneck issues in GPGPUs. By conducting detailed JIT compilation analysis and deriving mappings between threads and data indices at kernel-launch-time, smarter decisions and data partition management can be achieved.

Exploiting data locality in GPGPUs is critical for efficiently using the smaller data caches and handling the memory bottleneck problem. This paper proposes a thread block-centric locality analysis, which identifies the locality among the thread blocks (TBs) in terms of a number of common data references. In LocalityGuru, we seek to employ a detailed just-in-time (JIT) compilation analysis of the static memory accesses in the source code and derive the mapping between the threads and data indices at kernel-launch-time. Our locality analysis technique can be employed at multiple granularities such as threads, warps, and thread blocks in a GPU Kernel. This information can be leveraged to help make smarter decisions for locality-aware data-partition, memory page data placement, cache management, and scheduling in single-GPU and multi-GPU systems. The results of the LocalityGuru PTX analyzer are then validated by comparing with the Locality graph obtained through profiling. Since the entire analysis is carried out by the compiler before the kernel launch time, it does not introduce any timing overhead to the kernel execution time.

LocalityGuru: A PTX Analyzer for Extracting Thread Block-level Locality in GPGPUs

期刊

2021 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE AND STORAGE (NAS)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

LocalityGuru: A PTX Analyzer for Extracting Thread Block-level Locality in GPGPUs

期刊

2021 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE AND STORAGE (NAS)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文