4.7 Article

Knowledge Transfer via Decomposing Essential Information in Convolutional Neural Networks

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2020.3027837

关键词

Knowledge engineering; Shape; Learning systems; Neural networks; Feature extraction; Knowledge transfer; Task analysis; Deep neural network (DNN); knowledge transfer; multitask learning; smaller network

资金

  1. Institute of Information & communications Technology Planning & Evaluation (IITP) - Korea government (MSIT) [2020-0-01389]
  2. Industrial Technology Innovation Program - Ministry of Trade, Industry and Energy (MI, Korea) (Development of human-friendly human-robot interaction technologies using human internal emotional states) [10073154]

向作者/读者索取更多资源

Knowledge distillation is a method to improve the performance of a student network by transferring knowledge from a teacher network. The proposed method transfers knowledge independently of the spatial shape of the teacher's feature map using singular value decomposition. Additionally, a multitask learning method is presented to effectively adjust the teacher's constraints to the student's learning speed. Experimental results show significant improvements on different datasets.
Knowledge distillation (KD) from a teacher neural network and transfer of the knowledge to a small student network is done to improve the performance of the student network. This method is one of the most popular techniques to lighten convolutional neural networks (CNNs). Many KD algorithms have been proposed recently, but they still cannot properly distill essential knowledge of the teacher network, and the transfer tends to depend on the spatial shape of the teacher's feature map. To solve these problems, we propose a method to transfer knowledge independently of the spatial shape of the teacher's feature map, which is major information obtained by decomposing the feature map through singular value decomposition (SVD). In addition, we present a multitask learning method that enables the student to learn the teacher's knowledge effectively by adaptively adjusting the teacher's constraints to the student's learning speed. Experimental results show that the proposed method performs 2.37% better on the CIFAR100 data set and 2.89% better on the TinyImageNet data set than the state-of-the-art method. The source code is publicly available at https://github.com/sseung0703/KD_methods_with_TF.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据