4.7 Article

Multi-target Knowledge Distillation via Student Self-reflection

期刊

INTERNATIONAL JOURNAL OF COMPUTER VISION
卷 131, 期 7, 页码 1857-1874

出版社

SPRINGER
DOI: 10.1007/s11263-023-01792-z

关键词

Knowledge distillation; Self-reflection learning; Model compression; Deep learning

向作者/读者索取更多资源

Knowledge distillation is an effective technique for compressing deep models by transferring knowledge from a large teacher model to a small student model. Existing methods mainly focus on unidirectional knowledge transfer, overlooking the effectiveness of students' self-reflection in real-world education scenarios. To address this, we propose a new framework called MTKD-SSR that enhances the teacher's ability to transfer knowledge and improves the student's capacity to absorb knowledge through self-reflection.
Knowledge distillation is a simple yet effective technique for deep model compression, which aims to transfer the knowledge learned by a large teacher model to a small student model. To mimic how the teacher teaches the student, existing knowledge distillation methods mainly adapt an unidirectional knowledge transfer, where the knowledge extracted from different intermedicate layers of the teacher model is used to guide the student model. However, it turns out that the students can learn more effectively through multi-stage learning with a self-reflection in the real-world education scenario, which is nevertheless ignored by current knowledge distillation methods. Inspired by this, we devise a new knowledge distillation framework entitled multi-target knowledge distillation via student self-reflection or MTKD-SSR, which can not only enhance the teacher's ability in unfolding the knowledge to be distilled, but also improve the student's capacity of digesting the knowledge. Specifically, the proposed framework consists of three target knowledge distillation mechanisms: a stage-wise channel distillation (SCD), a stage-wise response distillation (SRD), and a cross-stage review distillation (CRD), where SCD and SRD transfer feature-based knowledge (i.e., channel features) and response-based knowledge (i.e., logits) at different stages, respectively; and CRD encourages the student model to conduct self-reflective learning after each stage by a self-distillation of the response-based knowledge. Experimental results on five popular visual recognition datasets, CIFAR-100, Market-1501, CUB200-2011, ImageNet, and Pascal VOC, demonstrate that the proposed framework significantly outperforms recent state-of-the-art knowledge distillation methods.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据