Journal
PACT '20: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES
Volume -, Issue -, Pages 97-109Publisher
ASSOC COMPUTING MACHINERY
DOI: 10.1145/3410463.3414637
Keywords
Loop Tiling; DMA; Scratchpad Memory; Compiler
Categories
Funding
- National Key Research and Development Program of China [2017YFB0202002]
- Strategic Priority Research Program of Chinese Academy of Sciences [XDC05030101]
- National Natural Science Foundation of China [61802368, 61521092, 61432016, 61432018, 61332009, 61702485, 61872043]
- CCF-Tencent Open Research Fund
- Australian Research Council [DP170103956, DP180104069]
Ask authors/readers for more resources
Scratchpad Memory (SPM) is widely used in emerging domain-specific architectures and accelerators for improving energy efficiency and time predictability. Typically, SPM-based architectures use DMA for fetching data from off-chip memory and global load instructions for loading fine-grained data directly into registers. For such architectures, neither capacity-only nor bandwidth-only loop tiling can efficiently use the bandwidth and SPM. This paper introduces a bandwidth-aware loop tiling approach that enables a tradeoff between SPM space utilization and bandwidth utilization to be made, by leveraging a runtime tiling framework and a cross-host-kernel IPA. Experimental results demonstrate that our approach can achieve the performance improvement of up to 4x, with a geometric average of 26%.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available