☆ 3.8 Proceedings Paper

Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

2022 IEEE 13TH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC) (2022)

期刊

2022 IEEE 13TH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC)

卷 -, 期 -, 页码 33-38

出版社

IEEE

关键词

Deep Learning; NAS; Quantization; TinyML

类别

Computer Science, Theory & Methods Green & Sustainable Science & Technology Engineering, Electrical & Electronic

资金

ECSEL Joint Undertaking (JU) [101007321]
European Union, France
European Union, Belgium
European Union, Czech Republic
European Union, Germany
European Union, Italy
European Union, Sweden
European Union, Switzerland
European Union, Turkey

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Quantization is a widely used method to reduce memory occupation, latency, and energy consumption of deep neural networks in cloud and edge systems. This study proposes a novel neural architecture search method that allows higher precision assignment based on important features, resulting in reduced memory and energy consumption while maintaining the same accuracy.

Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for different portions of the network, has been shown to provide excellent efficiency gains with limited accuracy drops, especially with optimized bit-width assignments determined by automated Neural Architecture Search (NAS) tools. State-of-the-art mixed-precision works layer-wise, i.e., it uses different bit-widths for the weights and activations tensors of each network layer. In this work, we widen the search space, proposing a novel NAS that selects the bit-width of each weight tensor channel independently. This gives the tool the additional flexibility of assigning a higher precision only to the weights associated with the most informative features. Testing on the MLPerf Tiny benchmark suite, we obtain a rich collection of Pareto-optimal models in the accuracy vs model size and accuracy vs energy spaces. When deployed on the MPIC RISC-V edge processor, our networks reduce the memory and energy for inference by up to 63% and 27% respectively compared to a layer-wise approach, for the same accuracy.

Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

期刊

2022 IEEE 13TH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

期刊

2022 IEEE 13TH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文