4.7 Article

Research on knowledge distillation algorithm based on Yolov5 attention mechanism

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 240, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2023.122553

关键词

Target detection; Knowledge distillation; Deep learning; Feature acquisition; Migration learning; Model compression

向作者/读者索取更多资源

This paper presents an improved knowledge distillation algorithm that can effectively compress models and improve detection performance on mobile devices by enhancing global feature representation and introducing an interpretable feature shifting method.
The current most advanced CNN-based detection models are nearly not deployable on mobile devices with limited arithmetic power due to problems such as too many redundant parameters and excessive arithmetic power required, and knowledge distillation as a potentially practical model compression approach can alleviate this limitation. In the past, feature-based knowledge distillation algorithms focused more on transferring the local features customized by people and reduced the full grasp of global information in images. To address the shortcomings of traditional feature distillation algorithms, we first improve GAMAttention to learn the global feature representation in images, and the improved attention mechanism can minimize the information loss caused by processing features. Secondly, feature shifting no longer defines manually which features should be shifted, a more interpretable approach is proposed where the student network learns to emulate the high-response feature regions predicted by the teacher network, which increases the end-to-end properties of the model, and feature shifting allows the student network to simulate the teacher network in generating semantically strong feature maps to improve the detection performance of the small model. To avoid learning too many noisy features when learning background features, these two parts of feature distillation are assigned different weights. Finally, logical distillation is performed on the prediction heads of the student and teacher networks. In this experiment, we chose Yolov5 as the base network structure for teacher-student pairs. We improved Yolov5s through attention and knowledge distillation, ultimately achieving a 1.3% performance gain on VOC and a 1.8% performance gain on KITTI.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据