期刊
出版社
ASSOC COMPUTING MACHINERY
DOI: 10.1145/3410463.3414637
关键词
Loop Tiling; DMA; Scratchpad Memory; Compiler
类别
资金
- National Key Research and Development Program of China [2017YFB0202002]
- Strategic Priority Research Program of Chinese Academy of Sciences [XDC05030101]
- National Natural Science Foundation of China [61802368, 61521092, 61432016, 61432018, 61332009, 61702485, 61872043]
- CCF-Tencent Open Research Fund
- Australian Research Council [DP170103956, DP180104069]
Scratchpad Memory (SPM) is widely used in emerging domain-specific architectures and accelerators for improving energy efficiency and time predictability. Typically, SPM-based architectures use DMA for fetching data from off-chip memory and global load instructions for loading fine-grained data directly into registers. For such architectures, neither capacity-only nor bandwidth-only loop tiling can efficiently use the bandwidth and SPM. This paper introduces a bandwidth-aware loop tiling approach that enables a tradeoff between SPM space utilization and bandwidth utilization to be made, by leveraging a runtime tiling framework and a cross-host-kernel IPA. Experimental results demonstrate that our approach can achieve the performance improvement of up to 4x, with a geometric average of 26%.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据