4.6 Article

10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors

期刊

IEEE ACCESS
卷 9, 期 -, 页码 71262-71276

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2021.3079425

关键词

Random access memory; Throughput; Energy efficiency; Transistors; Program processors; Parallel processing; Layout; Computing-in-memory; static random access memory; deep neural network; machine learning; edge processor

资金

  1. Basic Science Research Program through the National Research Foundation of Korea [2021R1A2B5B01001475]
  2. National Research and Development Program through the National Research Foundation of Korea - Ministry of Science and ICT [2020M3H2A1076786]
  3. National Research Foundation of Korea [2021R1A2B5B01001475] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

向作者/读者索取更多资源

This paper introduces a scalable computing-in-memory design approach based on a new ten-transistor static random access memory (SRAM) bit-cell, supporting multi-bit and binary MAC operations with high throughput and energy efficiency. The designs presented in the paper successfully address previous challenges, improve inference accuracy, solve the CDAC area issue, and demonstrate four 2-bit parallel multiplication using CDAC.
Computing-in-memory (CIM) is a promising approach to reduce latency and improve the energy efficiency of the multiply-and-accumulate (MAC) operation under a memory wall constraint for artificial intelligence (AI) edge processors. This paper proposes an approach focusing on scalable CIM designs using a new ten-transistor (10T) static random access memory (SRAM) bit-cell. Using the proposed 10T SRAM bit-cell, we present two SRAM-based CIM (SRAM-CIM) macros supporting multibit and binary MAC operations. The first design achieves fully parallel computing and high throughput using 32 parallel binary MAC operations. Advanced circuit techniques such as an input-dependent dynamic reference generator and an input-boosted sense amplifier are presented. Fabricated in 28 nm CMOS process, this design achieves 409.6 GOPS throughput, 1001.7 TOPS/W energy efficiency, and a 169.9 TOPS/mm(2) throughput area efficiency. The proposed approach effectively solves previous problems such as writing disturb, throughput, and the power consumption of an analog to digital converter (ADC). The second design supports multibit MAC operation (4-b weight, 4-b input, and 8-b output) to increase the inference accuracy. We propose an architecture that divides 4-b weight and 4-b input multiplication to four 2-b multiplication in parallel, which increases the signal margin by 16x compared to conventional 4-b multiplication. Besides, the capacitive digital-to-analog converter (CDAC) area issue is effectively addressed using the intrinsic bit-line capacitance existing in the SRAM-CIM architecture. The proposed approach of realizing four 2-b parallel multiplication using the CDAC is successfully demonstrated with a modified LeNet-5 neural network. These results demonstrate that the proposed 10T bit-cell is promising for realizing robust and scalable SRAM-CIM designs, which is essential for realizing fully parallel edge computing.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据