4.4 Article

High-Performance STT-MRAM-Based Computing-in-Memory Scheme Utilizing Data Read Feature

Journal

IEEE TRANSACTIONS ON NANOTECHNOLOGY
Volume 22, Issue -, Pages 817-826

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNANO.2023.3336910

Keywords

Computing-in-memory; STT-MRAM; boolean logics; BNN; XNOR-PC

Ask authors/readers for more resources

With the development of AI and BNN, the traditional computing system faces challenges in terms of memory and power. To improve computing efficiency, a CiM architecture is proposed, using STT-MRAM as the carrier. By optimizing the reading characteristics of STT-MRAM and modifying the peripheral circuitry, higher performance and lower energy consumption can be achieved.
With the development of Artificial Intelligence (AI) and Binary neural networks (BNN), the computing efficiency of the computing system is expected to be much better, however, enormous amounts of data processing have caused an intolerable 'memory wall' and 'power wall' challenge for traditional Von Neumann architectures. Therefore, more advanced Computing-in-memory (CiM) architectures are proposed. The emerging non-volatile memory Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM), with its fast access speed, near-zero leakage power consumption and high density is one of the most competitive carriers for CiM architectures.Due to the inherent read-write asymmetry of STT-MRAM, the read overhead is multiples smaller than the exaggerated write overhead. To better exploit the potential of MRAM, we focus on implementing an CiM based on the reading process.This work introduces the principle of CiM and proposes four basic logic operations (XNOR, XOR, AND and OR) based on STT-MRAM. Furthermore, a delay-based XNOR-popcount (XNOR-PC) operation for BNN based on XNOR operation is proposed. Incorporating the reading characteristics of STT-MRAM and slight modifications to the peripheral circuitry, these operations achieve significant optimisation in terms of performance and energy consumption. Simulation results show the proposed scheme can reduce the latency of XOR, AND, and OR operations at least by 99.3%, 82.2% and 80.2% compared with the existing design. The proposed XNOR-PC operation achieves at least a 59.4% reduction in power consumption and has a high degree of reliability. Also, Monte Carlo simulations prove the feasibility and robustness of the schemes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available