☆ 4.7 Article

ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING (2023)

Journal

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING

Volume 11, Issue 2, Pages 388-403

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TETC.2022.3226132

Keywords

Near-data processing; inter-segment data movement; application partitioning

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Partitioning applications between near-data processing (NDP) and host CPU cores causes inter-segment data movement overhead, which can be mitigated by ALP, a programmer-transparent technique that proactively and accurately transfers required data between segments based on the invariant instructions. Evaluation on a wide range of workloads demonstrates significant speedup over traditional CPU-only and NDP-only executions.

Partitioning applications between near-data processing (NDP) and host CPU cores causes inter-segment data movement overhead, which is caused by moving data generated by one segment (e.g., instructions, functions) and used in other consecutive segments. Prior works take two approaches to this problem. The first approach maps segments to NDP or host cores based on the properties of each segment, neglecting the inter-segment data movement overhead. The second approach partitions applications based on the overall memory bandwidth savings, and does not offload each segment to the best-fitting core if they incur high inter-segment data movement. We show that 1) mapping each segment to its best-fitting core ideally can provide substantial benefits, and 2) the inter-segment data movement reduces this benefit significantly. We introduce ALP, a new programmer-transparent technique to alleviate the inter-segment data movement overhead between host and memory in NDP systems. ALP proactively and accurately transfers the required data between the segments based on the key observation that the instructions that generate the inter-segment data stay the same across different executions of a program. ALP uses a compiler pass to identify these instructions and uses specialized hardware to transfer their produced data at runtime. We evaluate ALP across a wide range of workloads and demonstrate 54.3% and 45.4% average speedup over CPU-only and NDP-only executions, respectively.

ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems

Journal

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems

Journal

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper