4.7 Article

ABM-SpConv-SIMD: Accelerating Convolutional Neural Network Inference for Industrial IoT Applications on Edge Devices

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TNSE.2022.3154412

关键词

Convolutional neural networks; edge devices; industrial internet-of-things applications; single instruction multiple data; sparse convolution

向作者/读者索取更多资源

This paper proposes an on-device inference optimization framework called ABM-SpConv-SIMD, aiming to accelerate network inference by fully utilizing low-cost and common CPU resources. Experimental results show that ABM-SpConv-SIMD can significantly improve performance and achieve acceleration with negligible loss of accuracy.
Convolutional Neural Networks (CNNs) have been widely deployed, while traditional cloud data-centers based applications suffer from the bandwidth and latency network demand when applying to Industrial-Internet-of-Things (IIoT) fields. It is critical to migrate the CNNs inference to edge devices for efficiency and security concerns. However, it is challenging to deploy complex CNNs on resource-constraint IIoT edge devices due to a large number of parameters and intensive floating-point computations. In this paper, we propose ABM-SpConv-SIMD, an on-device inference optimization framework, aiming at accelerating the network inference by fully utilizing the low-cost and common CPU resource. ABM-SpConv-SIMD first adopts a model optimizer with pruning and quantization, which produces Sparse Convolutional models. And then, the Accumulation-Before-Multiplication mechanism is proposed to reduce multiplication operations. Additionally, the SIMD instructions, which are commonly available on cost-effective edge devices, are employed to improve the performance of convolutions. We have implemented ABM-SpConv-SIMD base on the ARM Compute Library software framework and evaluated on Hikey970 and Raspberry Pi devices with two representative models AlexNet and ResNet50. The results show that the ABM-SpConv-SIMD can significantly improve the performance, and achieve on average of 1.96x and 1.73x speedup respectively over the baseline implementation with negligible loss of accuracy.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据