☆ 4.3 Article

PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION (2022)

Journal

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION

Volume 20, Issue 1, Pages -

Publisher

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3563697

Keywords

Processing-using-memory; processing-in-memory; RISC-V; FPGA; DRAM; memory controllers

Funding

Semiconductor Research Corporation
ETH Future Computing Laboratory

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper introduces commodity DRAM-based processing-using-memory (PuM) techniques that can alleviate the data movement bottleneck at low cost. The challenges of system integration for these techniques are discussed, and a flexible framework called Processing-in-DRAM (PiDRAM) is developed to address these challenges. The authors implement and evaluate two PuM techniques, demonstrating the flexibility and effectiveness of PiDRAM. The potential performance improvement brought by PiDRAM is observed.

Commodity DRAM-based processing-using-memory (PuM) techniques that are supported by off-the-shelf DRAM chips present an opportunity for alleviating the data movement bottleneck at low cost. However, system integration of these techniques imposes non-trivial challenges that are yet to be solved. Potential solutions to the integration challenges require appropriate tools to develop any necessary hardware and software components. Unfortunately, current proprietary computing systems, specialized DRAM-testing platforms, or system simulators do not provide the flexibility and/or the holistic system view that is necessary to properly evaluate and deal with the integration challenges of commodity DRAM-based PuM techniques. We design and develop Processing-in-DRAM (PiDRAM), the first flexible end-to-end framework that enables system integration studies and evaluation of real, commodity DRAM-based PuM techniques. PiDRAM provides software and hardware components to rapidly integrate PuM techniques across the whole system software and hardware stack. We implement PiDRAM on an FPGA-based RISC-V system. To demonstrate the flexibility and ease of use of PiDRAM, we implement and evaluate two state-of-the-art commodity DRAMbased PuM techniques: (i) in-DRAM copy and initialization (RowClone) and (ii) in-DRAM true random number generation (D-RaNGe). We describe how we solve key integration challenges to make such techniques work and be effective on a real-system prototype, including memory allocation, alignment, and coherence. We observe that end-to-end RowClone speeds up bulk copy and initialization operations by 14.6x and 12.6x, respectively, over conventional CPU copy, even when coherence is supported with inefficient cache flush operations. Over PiDRAM's extensible codebase, integrating both RowClone and D-RaNGe end-to-end on a real RISC-V system prototype takes only 388 lines of Verilog code and 643 lines of C++ code.

PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM

Journal

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM

Journal

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper