期刊
KNOWLEDGE-BASED SYSTEMS
卷 218, 期 -, 页码 -出版社
ELSEVIER
DOI: 10.1016/j.knosys.2021.106837
关键词
Neural network compression; Knowledge distillation; Knowledge transfer
资金
- National Key Research and Development Program of China [2018YFB0204301]
- National Key Research and Development Program, China [2017YFB0202104]
- National Natural Science Foundation of China [61806213]
Knowledge distillation is a network compression technique where a teacher network guides a student network to mimic its behavior. This study explores how to train as a good teacher, proposing inter-class correlation regularization. Experimental results show that this method achieves good performance in image classification tasks.
Knowledge distillation (KD) is a widely used network compression technique for seeking a light student network with similar behaviors to its heavy teacher network. Previous studies mainly focus on training the student to mimic representation space of the teacher. However, how to be a good teacher is rarely explored. We find that if a teacher has weak ability to capture the knowledge underlying the true data in the real world, the student cannot even learn knowledge from its teacher. Inspired by that, we propose an inter-class correlation regularization to train teacher to capture a more explicit correlation among classes. Besides, we enforce student to mimic inter-class correlation of its teacher. Extensive experiments of image classification task have been conducted on four public benchmarks. For example, when the teacher and student networks are ShuffleNetV2-1.0 and ShuffleNetV2-0.5, our proposed method achieves 42.63% top-1 error rate for Tiny ImageNet. (C) 2021 Elsevier B.V. All rights reserved.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据