Journal
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION
Volume 12, Issue 4, Pages -Publisher
ASSOC COMPUTING MACHINERY
DOI: 10.1145/2836168
Keywords
Load value approximation; GPUs; value prediction; memory latency; memory bandwidth; Design; Algorithms; Performance
Funding
- Qualcomm Innovation Fellowship
- Microsoft Research PhD Fellowship
- Nvidia
- NSF [1409723, 1423172, 1212962]
- CCF [1553192]
- Semiconductor Research Corporation [2014-EP-2577]
Ask authors/readers for more resources
This article aims to tackle two fundamental memory bottlenecks: limited off-chip bandwidth (bandwidth wall) and long access latency (memory wall). To achieve this goal, our approach exploits the inherent error resilience of a wide range of applications. We introduce an approximation technique, called Rollback-Free Value Prediction (RFVP). When certain safe-to-approximate load operations miss in the cache, RFVP predicts the requested values. However, RFVP does not check for or recover from load-value mispredictions, hence, avoiding the high cost of pipeline flushes and re-executions. RFVP mitigates the memory wall by enabling the execution to continue without stalling for long-latency memory accesses. To mitigate the bandwidth wall, RFVP drops a fraction of load requests that miss in the cache after predicting their values. Dropping requests reduces memory bandwidth contention by removing them from the system. The drop rate is a knob to control the trade-off between performance/energy efficiency and output quality. Our extensive evaluations show that RFVP, when used in GPUs, yields significant performance improvement and energy reduction for a wide range of quality-loss levels. We also evaluate RFVP's latency benefits for a single core CPU. The results show performance improvement and energy reduction for a wide variety of applications with less than 1% loss in quality.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available