☆ 4.4 Article

A Multiply-and-Accumulate Array for Machine Learning Applications Based on a 3D Nanofabric Flow

IEEE TRANSACTIONS ON NANOTECHNOLOGY (2021)

Journal

IEEE TRANSACTIONS ON NANOTECHNOLOGY

Volume 20, Issue -, Pages 873-882

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNANO.2021.3132224

Keywords

3D logic integration; emerging technologies; hardware accelerators; nanotechnologies

Funding

IMEC core partners' CMOS program

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

In this paper, a novel 3D integration scheme called 3D Nanofabric is introduced to reduce manufacturing costs. Additionally, a low-footprint MAC unit and a systolic 3D MAC array aimed at convolutional neural networks are demonstrated using this scheme, showing significant improvements in area, area-delay-product, energy overhead, and TOPs/mm(2) compared to traditional 2D implementations.

To keep pushing Moore's law cadence and improve integrated circuits area, delay, and power, novel fabrication schemes such as parallel and monolithic 3D integration have been recently proposed. While parallel 3D is limited by the large TSV pitch, monolithic 3D suffers from the high cost of the additional masks and processing steps, limiting the number of stacked transistor layers. In our previous work, we introduced a novel 3D integration scheme called 3D Nanofabric. Inspired by the 3D NAND flash process, the flow consists of N identical vertical tiers where multiple vertical layers can be patterned simultaneously, significantly reducing the manufacturing cost. In this paper, we propose to build low-footprint Multiply-And-Accumulate (MAC) units using our 3D Nanofabric flow. Since a MAC unit can be laid out as a regular array, we demonstrate how to arrange in a 3D fashion across several vertical tiers of the 3D Nanofabric. Through circuit-level evaluations, we show that for a 64-input bit MAC unit consisting of 64 stacked vertical tiers, the area and area-delay-product are reduced by 21.0x and 16.7x, respectively, compared to a traditional 2D implementation using a 28 nm FDSOI technology, with only a 43% energy overhead. More importantly, the total fabrication cost is reduced, producing a cost scaling roadmap. Additionally, we show how to build a systolic 3D MAC array aimed at convolutional neural networks. Through architectural evaluations, we demonstrate that when running VGG-16, our 3D MAC array can improve the TOPs/mm(2) by 2.8x compared to a TPU-like 2D systolic array.

A Multiply-and-Accumulate Array for Machine Learning Applications Based on a 3D Nanofabric Flow

Journal

IEEE TRANSACTIONS ON NANOTECHNOLOGY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A Multiply-and-Accumulate Array for Machine Learning Applications Based on a 3D Nanofabric Flow

Journal

IEEE TRANSACTIONS ON NANOTECHNOLOGY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper