4.2 Article

TRiM: Tensor Reduction in Memory

期刊

IEEE COMPUTER ARCHITECTURE LETTERS
卷 20, 期 1, 页码 5-8

出版社

IEEE COMPUTER SOC
DOI: 10.1109/LCA.2020.3042805

关键词

DRAM; in-memory processing; near-data processing

资金

  1. Engineering Research Center Program through the National Research Foundation of Korea (NRF) - Korean Government MSIT [NRF-2018R1A5A1059921]

向作者/读者索取更多资源

Personalized recommendation systems are important in industry and the embedding layers within them are memory-intensive. To address performance bottlenecks, a fine-grained near-data processing architecture has been proposed for DRAM, with in-DRAM reduction units at different levels achieving significant performance improvements. Hot embedding-vector replication is also introduced to alleviate load imbalances across reduction units.
Personalized recommendation systems are gaining significant traction due to their industrial importance. An important building block of recommendation systems consists of what is known as the embedding layers, which exhibit a highly memory-intensive characteristics. Fundamental primitives of embedding layers are the embedding vector gathers followed by vector reductions, which exhibit low arithmetic intensity and becomes bottlenecked by the memory throughput. To address this issue, recent proposals in this research space employ a near-data processing (NDP) solution at the DRAM rank-level, achieving a significant performance speedup. We observe that prior NDP solutions based on rank-level parallelism leave significant performance left on the table, as they do not fully reap the abundant data transfer throughput inherent in DRAM datapaths. Based on the observation that the datapath of the DRAM has a hierarchical tree structure, we propose a novel, fine-grained NDP architecture for recommendation systems, which augments the DRAM datapath with an in-DRAM reduction unit at the DDR4/5 rank/bank-group/bank level, achieving significant performance improvements over state-of-the-art approaches. We also propose hot embedding-vector replication to alleviate the load imbalance across the reduction units.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据