期刊
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
卷 33, 期 1, 页码 366-377出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2020.3027837
关键词
Knowledge engineering; Shape; Learning systems; Neural networks; Feature extraction; Knowledge transfer; Task analysis; Deep neural network (DNN); knowledge transfer; multitask learning; smaller network
类别
资金
- Institute of Information & communications Technology Planning & Evaluation (IITP) - Korea government (MSIT) [2020-0-01389]
- Industrial Technology Innovation Program - Ministry of Trade, Industry and Energy (MI, Korea) (Development of human-friendly human-robot interaction technologies using human internal emotional states) [10073154]
Knowledge distillation is a method to improve the performance of a student network by transferring knowledge from a teacher network. The proposed method transfers knowledge independently of the spatial shape of the teacher's feature map using singular value decomposition. Additionally, a multitask learning method is presented to effectively adjust the teacher's constraints to the student's learning speed. Experimental results show significant improvements on different datasets.
Knowledge distillation (KD) from a teacher neural network and transfer of the knowledge to a small student network is done to improve the performance of the student network. This method is one of the most popular techniques to lighten convolutional neural networks (CNNs). Many KD algorithms have been proposed recently, but they still cannot properly distill essential knowledge of the teacher network, and the transfer tends to depend on the spatial shape of the teacher's feature map. To solve these problems, we propose a method to transfer knowledge independently of the spatial shape of the teacher's feature map, which is major information obtained by decomposing the feature map through singular value decomposition (SVD). In addition, we present a multitask learning method that enables the student to learn the teacher's knowledge effectively by adaptively adjusting the teacher's constraints to the student's learning speed. Experimental results show that the proposed method performs 2.37% better on the CIFAR100 data set and 2.89% better on the TinyImageNet data set than the state-of-the-art method. The source code is publicly available at https://github.com/sseung0703/KD_methods_with_TF.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据