Journal
IEEE TRANSACTIONS ON RELIABILITY
Volume 69, Issue 2, Pages 594-610Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TR.2019.2923258
Keywords
Reliability; Error analysis; Radio frequency; Magnetic tunneling; Random access memory; Torque; Switches; Cache memory; error rate; process variations (PVs); read disturbance; retention failure (RF); spin transfer torque magnetic RAM (STT-MRAM); write failure
Categories
Funding
- Iran National Science Foundation [96006071]
- Iran National Elites Foundation
Ask authors/readers for more resources
Spin-transfer torque magnetic RAM (STT-MRAM) is known as the most promising replacement for static random access memory (SRAM) technology in large last-level cache memories (LLC). Despite its high density, nonvolatility, near-zero leakage power, and immunity to radiation as the major advantages, STT-MRAM-based cache memory suffers from high error rates mainly due to retention failure (RF), read disturbance, and write failure. Existing studies are limited to estimate the rate of only one or two of these error types for STT-MRAM cache. However, the overall vulnerability of STT-MRAM caches, whose estimation is a must to design cost-efficient reliable caches, has not been studied previously. In this paper, we propose a system-level framework for reliability exploration and characterization of errors' behavior in STT-MRAM caches. To this end, we formulate the cache vulnerability considering the intercorrelation of the error types including RF, read disturbance, and write failure as well as the dependency of error rates to workloads' behavior and process variations (PVs). Our analysis reveals that STT-MRAM cache vulnerability is highly workload-dependent and varies by orders of magnitude in different cache access patterns. Our analytical study also shows that this vulnerability divergence significantly increases by PVs in STT-MRAM cells. To take the effects of system workloads and PVs into account, we implement the error types in gem5 full-system simulator. The experimental results using a comprehensive set of multiprogrammed workloads from SPEC CPU2006 benchmark suite on a quad-core processor show that the total error rate in a shared STT-MRAM LLC varies by 32.0x for different workloads. A further 6.5x vulnerability variation is observed when considering PVs in the STT-MRAM cells. In addition, the contribution of each error type in total LLC vulnerability highly varies in different cache access patterns and moreover, error rates are differently affected by PVs. The proposed analytical and empirical studies can significantly help system architects for efficient utilization of error mitigation techniques and designing highly reliable and low-cost STT-MRAM LLCs.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available