Journal
IEEE JOURNAL ON EXPLORATORY SOLID-STATE COMPUTATIONAL DEVICES AND CIRCUITS
Volume 7, Issue 1, Pages 70-78Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JXCDC.2021.3092436
Keywords
3-D integration; energy efficient; systolic accelerators; thermal
Categories
Ask authors/readers for more resources
This study explores the design space of 3D systolic accelerators, proposing and evaluating various partitioned accelerator configurations. The results demonstrate that different partitioning methods can significantly reduce latency or energy consumption, and organizing the systolic array and SRAM tiers can limit temperature rises.
Machine learning (ML) accelerators have a broad spectrum of use cases that pose different requirements on accelerator design for latency, energy, and area. In the case of systolic array-based ML accelerators, this puts different constraints on processing element (PE) array dimensions and SRAM buffer sizes. The 3-D integration packs more compute or memory in the same 2-D footprint, which can be utilized to build more powerful or energy-efficient accelerators. However, 3-D also expands the design space of ML accelerators by additionally including different possible ways of partitioning the PE array and SRAM buffers among the vertical tiers. Moreover, the partitioning approach may also have different thermal implications. This work provides a systematic framework for performing system-level design space exploration of 3-D systolic accelerators. Using this framework, different 3-D-partitioned accelerator configurations are proposed and evaluated. The 3-D-stacked accelerator designs are modeled using the hybrid wafer bonding technique with a 1.44-mu m pitch of 3-D connection. Results show that different partitioning of the systolic array and SRAM buffers in a four-tier 3-D configuration can lead to either 1.1-3.9x latency reduction or 1-3x energy reduction compared to the baseline design of the same 2-D area footprint. It is also shown that by carefully organizing the systolic array and SRAM tiers using logic over memory, the temperature rise with 3-D across benchmarks can be limited to 6 ffiC.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available