4.3 Article Proceedings Paper

CASCADE: High Throughput Data Streaming via Decoupled Access-Execute CGRA

Journal

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3358177

Keywords

Coarse grained reconfigurable arrays; multi-bank memory partitioning; decoupled access-execute architectures

Funding

  1. National Research Foundation, Prime Minister's Office, Singapore [NRF2015-IIP003]
  2. Huawei International Pte. Ltd.

Ask authors/readers for more resources

A Coarse-Grained Reconfigurable Array (CGRA) is a promising high-performance low-power accelerator for compute-intensive loop kernels. While the mapping of the computations on the CGRA is a well-studied problem, bringing the data into the array at a high throughput remains a challenge. A conventional CGRA design involves on-array computations to generate memory addresses for data access undermining the attainable throughput. A decoupled access-execute architecture, on the other hand, isolates the memory access from the actual computations resulting in a significantly higher throughput. We propose a novel decoupled access-execute CGRA design called CASCADE with full architecture and compiler support for high-throughput data streaming from an on-chip multi-bank memory. CASCADE offloads the address computations for the multi-bank data memory access to a custom designed programmable hardware. An end-to-end fully-automated compiler synchronizes the conflict-free movement of data between the memory banks and the CGRA. Experimental evaluations show on average 3x performance benefit and 2.2x performance per watt improvement for CASCADE compared to an iso-area conventional CGRA with a bigger processing array in lieu of a dedicated hardware memory address generation logic.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available