4.6 Article

Deep Generative Knowledge Distillation by Likelihood Finetuning

Journal

IEEE ACCESS
Volume 11, Issue -, Pages 46441-46453

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2023.3273952

Keywords

Data models; Training; Maximum likelihood estimation; Task analysis; Blockchains; Generators; Image quality; Knowledge engineering; Deep learning; INDEX TERMS; Generative adversarial networks; Knowledge distillation; deep generative model; image quality evaluation; data-free knowledge distillation

Ask authors/readers for more resources

Knowledge Distillation (KD) is used to train smaller student models using a larger pretrained teacher model. Data-Free KD (DFKD) methods have been proposed to address privacy concerns in decentralized data systems like blockchain by extracting prior knowledge from teacher networks and synthesizing data for KD. This paper introduces Generative Knowledge Distillation (GenKD), a new DFKD framework that uses deep generative models (DGMs) to reduce the search space of data generation and achieve high-quality pseudo samples.
Knowledge Distillation (KD) is designed to train smaller student models using a larger pretrained teacher model. However, in decentralized data systems such as blockchain, privacy concerns may arise, making the data inaccessible. To address this issue, Data-Free KD (DFKD) methods have been proposed, which extract prior knowledge from teacher networks and use it to synthesize data for KD. Previous DFKD methods faced challenges due to the large search space of data generation. Recently, deep generative models (DGMs) have been proposed to learn data distribution using deep networks, which provides an efficient way to reduce the search space by generating a set of pseudo data. In this paper, we explore the performance of KD trained using pseudo samples generated by pretrained DGMs and find that the correlation with image quality is not always positive. Based on this observation, we propose a new DFKD framework called Generative Knowledge Distillation (GenKD) that reduces the search space by constructing a prior distribution modeled by DGMs for their power of likelihood estimation. Specifically, we use energy-based models (EBM) to generate data from the Maximum Likelihood Estimation (MLE) of the EBM and gradients from downstream KD tasks by policy gradient. We then train the student model using the pretrained teacher model and pseudo samples. We also implement our GenKD framework on several widely-used benchmarks, including CIFAR100, CIFAR10, and SVHN. Our experiments demonstrate that we can generate high-quality pseudo samples quantitatively and qualitatively using GenKD. Additionally, the top-1 accuracy of the student network can approach state-of-the-art (SOTA) DFKD methods trained using fewer pseudo samples and less generation time.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available