4.6 Article

Time Domain Analog Neuromorphic Engine Based on High-Density Non-Volatile Memory in Single-Poly CMOS

Journal

IEEE ACCESS
Volume 10, Issue -, Pages 49154-49166

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2022.3172488

Keywords

Analog computing; analog memory; analog neural networks; analog non-volatile memory; computing in-memory; neuromorphic computing

Funding

  1. EC Horizon 2020 Research and Innovation Programme under GA QUEFORMAL Grant [829035]

Ask authors/readers for more resources

Improving the energy efficiency of deep learning systems is crucial for edge devices and data centers. By utilizing analog vector-matrix multipliers, the energy consumption per operation can be significantly reduced. We present a design of an analog VMM using industry-standard technology, which achieves high performance in terms of energy efficiency.
Increasing the energy efficiency of deep learning systems is critical for improving the cognitive capability of edge devices, often battery operated, as well as for data centers, constrained by the total power envelope. Specialized architectures accelerated by analog vector-matrix multipliers (VMMs) can reduce by orders of magnitude the energy per operation, since the reduced precision of analog computation does not undermine the classification accuracy of the neural network. We show an analog vector-matrix multiplier fabricated with industry-standard 0.18 mu m CMOS process, exploiting a single-transistor non-volatile analog memory cell and dedicated technology circuit co-design. The design is focused on implementation in neural networks performing offline training. The VMM performs the analog multiplication of a vector of inputs, encoded in the duration of time pulses, times a matrix of weights, encoded in the programmable currents of the memory cells. A 1.72 mu m(2) memory cell is realized with a single transistor with floating gate, which can be operated as a two-terminal analog memristive device with more than 64 programmable current levels and high I-high/I-low ratio (> 10(3)), tuned by the charge injected in the floating gate. A small-area charge amplifier is used to convert the multiply and accumulate operation result into a voltage. System-level projections based on our measurements and simulations provide a throughput of 333.17 GOps/s and an energy efficiency of 122.3 TOps/J, higher than comparable-precision VMMs reported in the literature, and an equivalent area per cell down to 2.15 mu m(2), lower than any similar state-of-the-art solution. Of critical importance in view of translation to industry, our proposal uses in a new way an industry-standard low-cost single-poly CMOS process flow.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available