☆ 3.8 Proceedings Paper

FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction

2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021) (2021)

Journal

2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021)

Volume -, Issue -, Pages 908-920

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/HPCA51647.2021.00080

Keywords

Funding

Booz Allen Hamilton Inc.
Laboratory for Physical Sciences (LPS)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The study introduces an efficient solution for sparse data gathering called Fafnir, which minimizes data movement utilizing an intelligent reduction tree in memory and maximizes parallel memory accesses in near-data processing. Fafnir does not rely on spatial locality, offering higher efficiency and faster performance compared to existing NDP proposals.

Memory-bound sparse gathering, caused by irregular random memory accesses, has become an obstacle in several on-demand applications such as embedding lookup in recommendation systems. To reduce the amount of data movement, and thereby better utilize memory bandwidth, previous studies have proposed near-data processing (NDP) solutions. The issue of prior work, however, is that they either minimize data movement effectively at the cost of limited memory parallelism or try to improve memory parallelism (up to a certain degree) but cannot successfully decrease data movement, as prior proposals rely on spatial locality (an optimistic assumption) to utilize NDP. More importantly, neither approach proposes a solution for gathering data from random memory addresses; rather they just offload operations to NDP. We propose an effective solution for sparse gathering, an efficient near-memory intelligent reduction (Fafnir) tree, the leaves of which are all the ranks in a memory system, and the nodes gradually apply reduction operations while data is gathered from any rank. By using such an overall tree, Fafnir does not rely on spatial locality; therefore, it minimizes data movement by performing entire operations at NDP and fully benefits from parallel memory accesses in parallel processing at NDP. Further, Fafnir offers other advantages such as using fewer connections (because of the tree topology), eliminating redundant memory accesses without using costly and less effective caching mechanisms, and being applicable to other domains of sparse problems such as scientific computing and graph analytics. To evaluate Fafnir, we implement it on an XCVU9P Xilinx FPGA and in 7 am, ASAP ASIC. Fafnir looks up the embedding tables up to 21.3 x more quickly than the state-of-the-art NDP proposal. Furthermore, the generic architecture of Fafnir allows running classic sparse problems using the same 1.2 mm(2) hardware up to 4.6x more quickly than the state of the art.

FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction

Journal

2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021)

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction

Journal

2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021)

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper