4.4 Article

A Multiply-and-Accumulate Array for Machine Learning Applications Based on a 3D Nanofabric Flow

Journal

IEEE TRANSACTIONS ON NANOTECHNOLOGY
Volume 20, Issue -, Pages 873-882

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNANO.2021.3132224

Keywords

3D logic integration; emerging technologies; hardware accelerators; nanotechnologies

Funding

  1. IMEC core partners' CMOS program

Ask authors/readers for more resources

In this paper, a novel 3D integration scheme called 3D Nanofabric is introduced to reduce manufacturing costs. Additionally, a low-footprint MAC unit and a systolic 3D MAC array aimed at convolutional neural networks are demonstrated using this scheme, showing significant improvements in area, area-delay-product, energy overhead, and TOPs/mm(2) compared to traditional 2D implementations.
To keep pushing Moore's law cadence and improve integrated circuits area, delay, and power, novel fabrication schemes such as parallel and monolithic 3D integration have been recently proposed. While parallel 3D is limited by the large TSV pitch, monolithic 3D suffers from the high cost of the additional masks and processing steps, limiting the number of stacked transistor layers. In our previous work, we introduced a novel 3D integration scheme called 3D Nanofabric. Inspired by the 3D NAND flash process, the flow consists of N identical vertical tiers where multiple vertical layers can be patterned simultaneously, significantly reducing the manufacturing cost. In this paper, we propose to build low-footprint Multiply-And-Accumulate (MAC) units using our 3D Nanofabric flow. Since a MAC unit can be laid out as a regular array, we demonstrate how to arrange in a 3D fashion across several vertical tiers of the 3D Nanofabric. Through circuit-level evaluations, we show that for a 64-input bit MAC unit consisting of 64 stacked vertical tiers, the area and area-delay-product are reduced by 21.0x and 16.7x, respectively, compared to a traditional 2D implementation using a 28 nm FDSOI technology, with only a 43% energy overhead. More importantly, the total fabrication cost is reduced, producing a cost scaling roadmap. Additionally, we show how to build a systolic 3D MAC array aimed at convolutional neural networks. Through architectural evaluations, we demonstrate that when running VGG-16, our 3D MAC array can improve the TOPs/mm(2) by 2.8x compared to a TPU-like 2D systolic array.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available