4.6 Article

A Programmable Heterogeneous Microprocessor Based on Bit-Scalable In-Memory Computing

期刊

IEEE JOURNAL OF SOLID-STATE CIRCUITS
卷 55, 期 9, 页码 2609-2621

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JSSC.2020.2987714

关键词

Charge-domain compute; deep learning; hardware accelerators; in-memory computing (IMC); neural networks (NNs)

向作者/读者索取更多资源

In-memory computing (IMC) addresses the cost of accessing data from memory in a manner that introduces a tradeoff between energy/throughput and computation signal-to-noise ratio (SNR). However, low SNR posed a primary restriction to integrating IMC in larger, heterogeneous architectures required for practical workloads due to the challenges with creating robust abstractions necessary for the hardware and software stack. This work exploits recent progress in high-SNR IMC to achieve a programmable heterogeneous microprocessor architecture implemented in 65-nm CMOS and corresponding interfaces to the software that enables mapping of application workloads. The architecture consists of a 590-Kb IMC accelerator, configurable digital near-memory-computing (NMC) accelerator, RISC-V CPU, and other peripherals. To enable programmability, microarchitectural design of the IMC accelerator provides the integration in the standard processor memory space, area- and energy-efficient analog-to-digital conversion for interfacing to NMC, bit-scalable computation (1-8 b), and input-vector sparsity-proportional energy consumption. The IMC accelerator demonstrates excellent matching between computed outputs and idealized software-modeled outputs, at 1b TOPS/W of 192 vertical bar 400 and 1b-TOPS/mm(2) of 0.60 vertical bar 0.24 for MAC hardware, at V-DD of 1.2 vertical bar 0.85 V, both of which scale directly with the bit precision of the input vector and matrix elements. Software libraries developed for application mapping are used to demonstrate CIFAR-10 image classification with a ten-layer CNN, achieving accuracy, throughput, and energy of 89.3%vertical bar 92.4%, 176 vertical bar 23 images/s, and 5.31 vertical bar 105.2 mu J/image, for 1 vertical bar 4 b quantization levels.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据