4.7 Article

Multi-teacher knowledge distillation based on joint Guidance of Probe and Adaptive Corrector

Journal

NEURAL NETWORKS
Volume 164, Issue -, Pages 345-356

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.neunet.2023.04.015

Keywords

Knowledge distillation; Linear classifier probes; Convolutional neural networks; Spail attention; Model compression

Ask authors/readers for more resources

This paper proposes a multi-teacher knowledge distillation method based on the joint guidance of probe and adaptive corrector (GPAC). It addresses the issues in current multi-teacher KD algorithms, such as passive acquisition of knowledge and identical guiding schemes. The experimental results show that the proposed method achieves higher classification accuracy.
Knowledge distillation (KD) has been widely used in model compression. But, in the current multi -teacher KD algorithms, the student can only passively acquire the knowledge of the teacher's middle layer in a single form and all teachers use identical a guiding scheme to the student. To solve these problems, this paper proposes a multi-teacher KD based on joint Guidance of Probe and Adaptive Corrector (GPAC) method. First, GPAC proposes a teacher selection strategy guided by the Linear Classifier Probe (LCP). This strategy allows the student to select better teachers in the middle layer. Teachers are evaluated using the classification accuracy detected by LCP. Then, GPAC designs an adaptive multi-teacher instruction mechanism. The mechanism uses instructional weights to emphasize the student's predicted direction and reduce the student's difficulty learning from teachers. At the same time, every teacher can formulate guiding scheme according to the Kullback- Leibler divergence loss of the student and itself. Finally, GPAC develops a multi-level mechanism for adjusting spatial attention loss. this mechanism uses a piecewise function that varies with the number of epochs to adjust the spatial attention loss. This piecewise function classifies the student' learning about spatial attention into three levels, which can efficiently use spatial attention of teachers. GPAC and the current state-of-the-art distillation methods are tested on CIFAR-10 and CIFAR-100 datasets. The experimental results demonstrate that the proposed method in this paper can obtain higher classification accuracy. (c) 2023 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available