☆ 4.4 Article

A Multiply-and-Accumulate Array for Machine Learning Applications Based on a 3D Nanofabric Flow

IEEE TRANSACTIONS ON NANOTECHNOLOGY (2021)

期刊

IEEE TRANSACTIONS ON NANOTECHNOLOGY

卷 20, 期 -, 页码 873-882

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNANO.2021.3132224

关键词

3D logic integration; emerging technologies; hardware accelerators; nanotechnologies

类别

Engineering, Electrical & Electronic Nanoscience & Nanotechnology Materials Science, Multidisciplinary Physics, Applied

资金

IMEC core partners' CMOS program

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this paper, a novel 3D integration scheme called 3D Nanofabric is introduced to reduce manufacturing costs. Additionally, a low-footprint MAC unit and a systolic 3D MAC array aimed at convolutional neural networks are demonstrated using this scheme, showing significant improvements in area, area-delay-product, energy overhead, and TOPs/mm(2) compared to traditional 2D implementations.

To keep pushing Moore's law cadence and improve integrated circuits area, delay, and power, novel fabrication schemes such as parallel and monolithic 3D integration have been recently proposed. While parallel 3D is limited by the large TSV pitch, monolithic 3D suffers from the high cost of the additional masks and processing steps, limiting the number of stacked transistor layers. In our previous work, we introduced a novel 3D integration scheme called 3D Nanofabric. Inspired by the 3D NAND flash process, the flow consists of N identical vertical tiers where multiple vertical layers can be patterned simultaneously, significantly reducing the manufacturing cost. In this paper, we propose to build low-footprint Multiply-And-Accumulate (MAC) units using our 3D Nanofabric flow. Since a MAC unit can be laid out as a regular array, we demonstrate how to arrange in a 3D fashion across several vertical tiers of the 3D Nanofabric. Through circuit-level evaluations, we show that for a 64-input bit MAC unit consisting of 64 stacked vertical tiers, the area and area-delay-product are reduced by 21.0x and 16.7x, respectively, compared to a traditional 2D implementation using a 28 nm FDSOI technology, with only a 43% energy overhead. More importantly, the total fabrication cost is reduced, producing a cost scaling roadmap. Additionally, we show how to build a systolic 3D MAC array aimed at convolutional neural networks. Through architectural evaluations, we demonstrate that when running VGG-16, our 3D MAC array can improve the TOPs/mm(2) by 2.8x compared to a TPU-like 2D systolic array.

A Multiply-and-Accumulate Array for Machine Learning Applications Based on a 3D Nanofabric Flow

期刊

IEEE TRANSACTIONS ON NANOTECHNOLOGY

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Multiply-and-Accumulate Array for Machine Learning Applications Based on a 3D Nanofabric Flow

期刊

IEEE TRANSACTIONS ON NANOTECHNOLOGY

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文