3.8 Proceedings Paper

Ultra-Elastic CGRAs for Irregular Loop Specialization

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/HPCA51647.2021.00042

Keywords

-

Funding

  1. DARPA SDH Award [FA8650-18-2-7863]
  2. DARPA POSH Award [FA8650-18-27852]
  3. NSF CRI Award [1512937]
  4. Center for Applications Driving Architectures (ADA)
  5. DARPA

Ask authors/readers for more resources

This paper addresses the challenges of irregular loop specialization using reconfigurable accelerator fabrics (CGRAs), proposing a novel elastic CGRA called ultra-elastic CGRAs (UE-CGRAs) that accelerates true-dependency bottlenecks and saves energy in irregular loops. The UE-CGRAs allow configurable fine-grain dynamic voltage and frequency scaling for each processing element in the CGRA, enabling efficient processing of irregular loops while also improving performance and energy efficiency compared to traditional CGRAs and RISC-V cores.
Reconfigurable accelerator fabrics, including coarse-grain reconfigurable arrays (CGRAs), have experienced a resurgence in interest because they allow fast-paced software algorithm development to continue evolving post-fabrication. CGRAs traditionally target regular workloads with data-level parallelism (e.g., neural networks, image processing), but once integrated into an SoC they remain idle and unused for irregular workloads. An emerging trend towards repurposing these idle resources raises important questions for how to efficiently map and execute general-purpose loops which may have irregular memory accesses, irregular control flow, and inter-iteration loop dependencies. Recent work has increasingly leveraged elasticity in CGRAs to mitigate the first two challenges, but elasticity alone does not address inter-iteration loop dependencies which can easily bottleneck overall performance. In this paper, we address all three challenges for irregular loop specialization and propose ultra-elastic CGRAs (UE-CGRAs), a novel elastic CGRA that accelerates true-dependency bottlenecks and saves energy in irregular loops by overcoming traditional VLSI challenges. UE-CGRAs allow configurable fine-grain dynamic voltage and frequency scaling (DVFS) for each of potentially hundreds of tiny processing elements (PEs) in the CGRA, enabling chains of connected PEs to rest at lower voltages and frequencies to save energy, while other chains of connected PEs can sprint at higher voltages and frequencies to accelerate through true-dependency bottlenecks. UE-CGRAs rely on a novel ratiochronous clocking scheme carefully overlaid on the inter-PE elastic interconnect to enable low-latency crossings while remaining fully verifiable with commercial static timing analysis tools. We present the UE-CGRA analytical model, compiler, architectural template, and VLSI circuitry, and we demonstrate how UE-CGRAs can specialize for irregular loops and improve performance (1.42-1.50x) or energy efficiency (1.24-2.32x) with reasonable area overhead compared to traditional inelastic and elastic CGRAs, while also improving performance (1.35-3.38x) or energy efficiency (up to 1.53x) compared to a RISC-V core.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available