4.7 Article

Improving knowledge distillation via an expressive teacher

期刊

KNOWLEDGE-BASED SYSTEMS
卷 218, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.knosys.2021.106837

关键词

Neural network compression; Knowledge distillation; Knowledge transfer

资金

  1. National Key Research and Development Program of China [2018YFB0204301]
  2. National Key Research and Development Program, China [2017YFB0202104]
  3. National Natural Science Foundation of China [61806213]

向作者/读者索取更多资源

Knowledge distillation is a network compression technique where a teacher network guides a student network to mimic its behavior. This study explores how to train as a good teacher, proposing inter-class correlation regularization. Experimental results show that this method achieves good performance in image classification tasks.
Knowledge distillation (KD) is a widely used network compression technique for seeking a light student network with similar behaviors to its heavy teacher network. Previous studies mainly focus on training the student to mimic representation space of the teacher. However, how to be a good teacher is rarely explored. We find that if a teacher has weak ability to capture the knowledge underlying the true data in the real world, the student cannot even learn knowledge from its teacher. Inspired by that, we propose an inter-class correlation regularization to train teacher to capture a more explicit correlation among classes. Besides, we enforce student to mimic inter-class correlation of its teacher. Extensive experiments of image classification task have been conducted on four public benchmarks. For example, when the teacher and student networks are ShuffleNetV2-1.0 and ShuffleNetV2-0.5, our proposed method achieves 42.63% top-1 error rate for Tiny ImageNet. (C) 2021 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据