4.7 Article

Improving knowledge distillation via an expressive teacher

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 218, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2021.106837

Keywords

Neural network compression; Knowledge distillation; Knowledge transfer

Funding

  1. National Key Research and Development Program of China [2018YFB0204301]
  2. National Key Research and Development Program, China [2017YFB0202104]
  3. National Natural Science Foundation of China [61806213]

Ask authors/readers for more resources

Knowledge distillation is a network compression technique where a teacher network guides a student network to mimic its behavior. This study explores how to train as a good teacher, proposing inter-class correlation regularization. Experimental results show that this method achieves good performance in image classification tasks.
Knowledge distillation (KD) is a widely used network compression technique for seeking a light student network with similar behaviors to its heavy teacher network. Previous studies mainly focus on training the student to mimic representation space of the teacher. However, how to be a good teacher is rarely explored. We find that if a teacher has weak ability to capture the knowledge underlying the true data in the real world, the student cannot even learn knowledge from its teacher. Inspired by that, we propose an inter-class correlation regularization to train teacher to capture a more explicit correlation among classes. Besides, we enforce student to mimic inter-class correlation of its teacher. Extensive experiments of image classification task have been conducted on four public benchmarks. For example, when the teacher and student networks are ShuffleNetV2-1.0 and ShuffleNetV2-0.5, our proposed method achieves 42.63% top-1 error rate for Tiny ImageNet. (C) 2021 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available