3.8 Proceedings Paper

TIMELY: Pushing Data Movements and Interfaces in PIM Accelerators Towards Local and in Time Domain

出版社

IEEE COMPUTER SOC
DOI: 10.1109/ISCA45697.2020.00073

关键词

processing in memory; analog processing; resistive-random-access-memory (ReRAM); neural networks

资金

  1. NIH [R01HL144683]
  2. NSF [1838873, 1816833, 1719160, 1725447, 1730309]
  3. Division of Computing and Communication Foundations
  4. Direct For Computer & Info Scie & Enginr [1816833] Funding Source: National Science Foundation
  5. Div Of Information & Intelligent Systems
  6. Direct For Computer & Info Scie & Enginr [1838873] Funding Source: National Science Foundation

向作者/读者索取更多资源

Resistive-random-access-memory (ReRAM) based processing-in-memory ((RPIM)-P-2) accelerators show promise in bridging the gap between Internet of Thing devices' constrained resources and Convolutional/Deep Neural Networks' (CNNs/DNNs') prohibitive energy cost. Specifically, (RPIM)-P-2 accelerators enhance energy efficiency by eliminating the cost of weight movements and improving the computational density through ReRAM's high density. However, the energy efficiency is still limited by the dominant energy cost of input and partial sum (Psum) movements and the cost of digital-to-analog (D/A) and analog-to-digital (A/D) interfaces. In this work, we identify three energy-saving opportunities in (RPIM)-P-2 accelerators: analog data locality, time-domain interfacing, and input access reduction, and propose an innovative (RPIM)-P-2 accelerator called TIMELY, with three key contributions: (1) TIMELY adopts analog local buffers (ALBs) within ReRAM crossbars to greatly enhance the data locality, minimizing the energy overheads of both input and Psum movements; (2) TIMELY largely reduces the energy of each single D/A (and A/D) conversion and the total number of conversions by using time-domain interfaces (TDIs) and the employed ALBs, respectively; (3) we develop an only-once input read ((OIR)-I-2) mapping method to further decrease the energy of input accesses and the number of D/A conversions. The evaluation with more than 10 CNN/DNN models and various chip configurations shows that, TIMELY outperforms the baseline (RPIM)-P-2 accelerator, PRIME, by one order of magnitude in energy efficiency while maintaining better computational density (up to 31.2x) and throughput (up to 736.6x). Furthermore, comprehensive studies are performed to evaluate the effectiveness of the proposed ALB, TDI, and (OIR)-I-2 in terms of energy savings and area reduction.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据