☆ 3.9 Article

DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning

ACM SIGPLAN NOTICES (2014)

Journal

ACM SIGPLAN NOTICES

Volume 49, Issue 4, Pages 269-283

Publisher

ASSOC COMPUTING MACHINERY

DOI: 10.1145/2541940.2541967

Keywords

Funding

Google Faculty Research Award
Intel Collaborative Research Institute for Computational Intelligence (ICRI-CI)
French ANR MHANN grant
French ANR NEMESIS grant
NSF of China [61003064, 61100163, 61133004, 61222204, 61221062, 61303158]
863 Program of China [2012AA012202]
Strategic Priority Research Program of the CAS [XDA06010403]
10,000 talent program
1,000 talent program

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have focused on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02 mm(2) and 485 mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87x faster, and it can reduce the total energy by 21.08x. The accelerator characteristics are obtained after layout at 65nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications.

DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning

Journal

ACM SIGPLAN NOTICES

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning

Journal

ACM SIGPLAN NOTICES

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper