4.7 Article

Improving Knowledge Distillation With a Customized Teacher

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2022.3189680

关键词

Knowledge distillation (KD); knowledge transfer; neural network acceleration; neural network compression

资金

  1. National Key Research and Development Program of China [2018YFB0204301]
  2. National Key Research and Development Program [2017YFB0202104]

向作者/读者索取更多资源

Knowledge distillation (KD) is a method to transfer knowledge from a complex network to a lightweight network, by selecting teachers based on the standard deviation of secondary soft probabilities, and using pretraining under dual supervision and an asymmetrical transformation function to enhance the dispersion of teachers' secondary soft probabilities.
Knowledge distillation (KD) is a widely used approach to transfer knowledge from a cumbersome network (also known as a teacher) to a lightweight network (also known as a student). However, even though the accuracies of different teachers are similar, the fixed student's accuracies are significantly different. We find that teachers with more dispersed secondary soft probabilities are more qualified to play their roles. Therefore, an indicator, i.e., the standard deviation sigma of secondary soft probabilities, is introduced to choose the teacher. Moreover, to make a teacher's secondary soft probabilities more dispersed, a novel method, dubbed pretraining the teacher under dual supervision (PTDS), is proposed to pretrain a teacher under dual supervision. In addition, we put forward an asymmetrical transformation function (ATF) to further enhance the dispersion degree of the pretrained teachers' secondary soft probabilities. The combination of PTDS and ATF is termed knowledge distillation with a customized teacher (KDCT). Extensive empirical experiments and analyses are conducted on three computer vision tasks, including image classification, transfer learning, and semantic segmentation, to substantiate the effectiveness of KDCT.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据