☆ 4.6 Article

Custom Hardware Inference Accelerator for TensorFlow Lite for Microcontrollers

IEEE ACCESS (2022)

期刊

IEEE ACCESS

卷 10, 期 -, 页码 73484-73493

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2022.3189776

关键词

Computational modeling; Artificial neural networks; Hardware acceleration; Microcontrollers; Software; Kernel; Computational efficiency; TinyML; neural processing unit; TensorFlow-Lite for microcontrollers; hardware-software codesign

类别

Computer Science, Information Systems Engineering, Electrical & Electronic Telecommunications

资金

High-Tech Scholarship Award
GenPro Consortium

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In recent years, the demand for efficient deployment of neural networks on edge devices has been increasing. However, the high computational demand prevents direct software deployment on resource-constrained edge devices. Various custom hardware accelerators have been proposed to enable real-time machine learning inference on low-power edge devices. This paper presents an efficient hardware-software framework, MCU-NPU, for accelerating machine learning inference on edge devices. The proposed framework supports weight compression and model pruning to reduce computational complexity and achieve high performance and low power consumption.

In recent years, the need for the efficient deployment of Neural Networks (NN) on edge devices has been steadily increasing. However, the high computational demand required for Machine Learning (ML) inference on tiny microcontroller-based IoT devices avoids a direct software deployment on such resource-constrained edge devices. Therefore, various custom and application-specific NN hardware accelerators have been proposed to enable real-time Machine Learning (ML) inference on low-power and resource-limited edge devices. Efficient mapping of the computational load onto hardware and software resources is a key challenge for performance improvement while keeping low power and a low area footprint. High performance and yet low power embedded processors may be attained via the usage of hardware acceleration. This paper presents an efficient hardware-software framework to accelerate machine learning inference on edge devices using a modified TensorFlow Lite for Microcontroller (TFLM) model running on a Microcontroller (MCU) and a dedicated Neural Processing Unit (NPU) custom hardware accelerator, referred to as MCU-NPU. The proposed framework supports weight compression of pruned quantized NN models and exploits the pruned model sparsity to reduce computational complexity further. The proposed methodology has been evaluated by employing the MCU-NPU acceleration for various TFLM-based NN architectures using the common MLPerf Tiny benchmark. Experimental results demonstrate a significant speedup of up to 724x compared to a pure software implementation. For example, the resulting runtime for the CIFAR-10 classification is reduced from about 20 sec to only 37 ms using the proposed hardware acceleration. Moreover, the proposed hardware accelerator outperforms all the reference models optimized for edge devices in terms of inference runtime.

Custom Hardware Inference Accelerator for TensorFlow Lite for Microcontrollers

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Custom Hardware Inference Accelerator for TensorFlow Lite for Microcontrollers

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文