4.7 Article

An Accurate, Error-Tolerant, and Energy-Efficient Neural Network Inference Engine Based on SONOS Analog Memory

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSI.2021.3134313

Keywords

SONOS devices; Neural networks; Transistors; Logic gates; Programming; Memristors; Analog memory; SONOS; charge trap memory; neuromorphic; neural network; analog; in-memory computing; inference accelerator

Funding

  1. Laboratory Directed Research and Development Program at Sandia National Laboratories
  2. Defense Threat Reduction Agency (DTRA) [HDTRA1-17-1-0038]

Ask authors/readers for more resources

We demonstrate SONOS analog memory arrays optimized for neural network inference, achieving high accuracy and energy efficiency. With subthreshold operation, low conductances can be implemented with low error, matching the weight distribution of neural networks. The system achieves accuracy within 2.16% of floating-point accuracy and a >10x gain in energy efficiency over state-of-the-art accelerators.
We demonstrate SONOS (silicon-oxide-nitride-oxide-silicon) analog memory arrays that are optimized for neural network inference. The devices are fabricated in a 40nm process and operated in the subthreshold regime for in-memory matrix multiplication. Subthreshold operation enables low conductances to be implemented with low error, which matches the typical weight distribution of neural networks, which is heavily skewed toward near-zero values. This leads to high accuracy in the presence of programming errors and process variations. We simulate the end-to-end neural network inference accuracy, accounting for the measured programming error, read noise, and retention loss in a fabricated SONOS array. Evaluated on the ImageNet dataset using ResNet50, the accuracy using a SONOS system is within 2.16% of floating-point accuracy without any retraining. The unique error properties and high On/Off ratio of the SONOS device allow scaling to large arrays without bit slicing, and enable an inference architecture that achieves 20 TOPS/W on ResNet50, a >10x gain in energy efficiency over state-of-the-art digital and analog inference accelerators.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available