☆ 4.7 Review

Custom Hardware Architectures for Deep Learning on Portable Devices: A Review

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2022)

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

卷 33, 期 11, 页码 6068-6088

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNNLS.2021.3082304

关键词

Computer architecture; Hardware; Computational modeling; Artificial neural networks; Optimization; Memory management; Convolution; Application-specific integrated circuit (ASIC); deep learning (DL); deep neural network (DNN); energy-efficient architectures; field-programmable gate array (FPGA); hardware accelerator; machine learning (ML); neural network hardware; review

类别

Computer Science, Artificial Intelligence Computer Science, Hardware & Architecture Computer Science, Theory & Methods Engineering, Electrical & Electronic

资金

Universiti Kebangsaan Malaysia [DPK-2021-001, DIP-2020-004, MI-2020-002]
Qatar National Research Foundation (QNRF) [NPRP12s-0227190164]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Recent advancements in deep learning applications have prompted researchers to reconsider hardware architecture to meet the demands of fast and efficient application-specific computations. The emergence of specialized DL processors has reduced reliance on cloud servers, improved privacy, reduced latency, and alleviated bandwidth congestion. As researchers explore various application-specific hardware architectures, new technologies and design considerations are being developed to enhance the performance and efficiency of DL tasks.

The staggering innovations and emergence of numerous deep learning (DL) applications have forced researchers to reconsider hardware architecture to accommodate fast and efficient application-specific computations. Applications, such as object detection, image recognition, speech translation, as well as music synthesis and image generation, can be performed with high accuracy at the expense of substantial computational resources using DL. Furthermore, the desire to adopt Industry 4.0 and smart technologies within the Internet of Things infrastructure has initiated several studies to enable on-chip DL capabilities for resource-constrained devices. Specialized DL processors reduce dependence on cloud servers, improve privacy, lessen latency, and mitigate bandwidth congestion. As we reach the limits of shrinking transistors, researchers are exploring various application-specific hardware architectures to meet the performance and efficiency requirements for DL tasks. Over the past few years, several software optimizations and hardware innovations have been proposed to efficiently perform these computations. In this article, we review several DL accelerators, as well as technologies with emerging devices, to highlight their architectural features in application-specific integrated circuit (IC) and field-programmable gate array (FPGA) platforms. Finally, the design considerations for DL hardware in portable applications have been discussed, along with some deductions about the future trends and potential research directions to innovate DL accelerator architectures further. By compiling this review, we expect to help aspiring researchers widen their knowledge in custom hardware architectures for DL.

Custom Hardware Architectures for Deep Learning on Portable Devices: A Review

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Custom Hardware Architectures for Deep Learning on Portable Devices: A Review

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文