4.6 Article

A Digital Processing in Memory Architecture Using TCAM for Rapid Learning and Inference Based on a Spike Location Dependent Plasticity

Journal

IEEE ACCESS
Volume 11, Issue -, Pages 3416-3430

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2023.3234323

Keywords

Digital processing in memory; fast training; in memory computation; TCAM

Ask authors/readers for more resources

In this paper, a digital processing in memory (DPIM) is presented, configured as a stride edge-detection search frequency neural network (SE-SFNN) trained through spike location dependent plasticity (SLDP). By employing a ternary data scheme and a two-layer format of ternary content addressable memory (TCAM), the computation time is significantly reduced. Additionally, a method to decrease the TCAM memory size is proposed, achieving a reduction of 84.4% in required memory size while maintaining an accuracy decrease of only 1.12%.
In this paper, we present a digital processing in memory (DPIM) configured as a stride edge-detection search frequency neural network (SE-SFNN) which is trained through spike location dependent plasticity (SLDP), a learning mechanism reminiscent of spike timing dependent plasticity (STDP). This mechanism allows for rapid online learning as well as a simple memory-based implementation. In particular, we employ a ternary data scheme to take advantage of ternary content addressable memory (TCAM). The scheme utilizes a ternary representation of the image pixels, and the TCAMs are used in a two-layer format to significantly reduce the computation time. The first layer applies several filtering kernels, followed by the second layer, which reorders the pattern dictionaries of TCAMs to place the most frequent patterns at the top of each supervised TCAM dictionary. Numerous TCAM blocks in both layers operate in a massively parallel fashion using digital ternary values. There are no complicated multiply operations performed, and learning is performed in a feedforward scheme. This allows rapid and robust learning as a trade-off with the parallel memory block size. Furthermore, we propose a method to reduce the TCAM memory size using a two-tiered minor to major promotion (M2MP) of frequently occurring patterns. This reduction scheme is performed concurrently during the learning operation without incurring a preconditioning overhead. We show that with minimal circuit overhead, the required memory size is reduced by 84.4%, and the total clock cycles required for learning also decrease by 97.31 % while the accuracy decreases only by 1.12%. We classified images with 94.58% accuracy on the MNIST dataset. Using a 100 MHz clock, our simulation results show that the MNIST training takes about 6.3 ms dissipating less than 4 mW of average power. In terms of inference speed, the trained hardware is capable of processing 5,882,352 images per second.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available