期刊
IEEE JOURNAL OF SOLID-STATE CIRCUITS
卷 57, 期 1, 页码 182-197出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JSSC.2021.3120113
关键词
Training; Artificial intelligence; AI accelerators; Inference algorithms; Computer architecture; Bandwidth; System-on-chip; Approximate computing; artificial intelligence (AI); deep neural networks (DNNs); hardware accelerators; machine learning (ML); reduced precision computation
The article presents a 7-nm four-core mixed-precision artificial intelligence chip that supports four compute precisions, demonstrating high power efficiency and performance while meeting diverse application demands.
Reduced precision computation is a key enabling factor for energy-efficient acceleration of deep learning (DL) applications. This article presents a 7-nm four-core mixed-precision artificial intelligence (AI) chip that supports four compute precisions--FP16, Hybrid-FP8 (HFP8), INT4, and INT2--to support diverse application demands for training and inference. The chip leverages cutting-edge algorithmic advances to demonstrate leading-edge power efficiency for 8-bit floating-point (FP8) training and INT4 inference without model accuracy degradation. A new HFP8 format combined with separation of the floating- and fixed-point pipelines and aggressive circuit/architecture optimization enables performance improvements while maintaining high compute utilization. A high-bandwidth ring protocol enables efficient data communication, while power management using workload-aware clock throttling maximizes performance within a given power budget. The AI chip demonstrates 3.58-TFLOPS/W peak energy efficiency and 26.2-TFLOPS peak performance for HFP8 iso-accuracy training, and 16.9-TOPS/W peak energy efficiency and 104.9-TOPS peak performance for INT4 iso-accuracy inference.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据