4.6 Article

Deep Generative Knowledge Distillation by Likelihood Finetuning

期刊

IEEE ACCESS
卷 11, 期 -, 页码 46441-46453

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2023.3273952

关键词

Data models; Training; Maximum likelihood estimation; Task analysis; Blockchains; Generators; Image quality; Knowledge engineering; Deep learning; INDEX TERMS; Generative adversarial networks; Knowledge distillation; deep generative model; image quality evaluation; data-free knowledge distillation

向作者/读者索取更多资源

Knowledge Distillation (KD) is used to train smaller student models using a larger pretrained teacher model. Data-Free KD (DFKD) methods have been proposed to address privacy concerns in decentralized data systems like blockchain by extracting prior knowledge from teacher networks and synthesizing data for KD. This paper introduces Generative Knowledge Distillation (GenKD), a new DFKD framework that uses deep generative models (DGMs) to reduce the search space of data generation and achieve high-quality pseudo samples.
Knowledge Distillation (KD) is designed to train smaller student models using a larger pretrained teacher model. However, in decentralized data systems such as blockchain, privacy concerns may arise, making the data inaccessible. To address this issue, Data-Free KD (DFKD) methods have been proposed, which extract prior knowledge from teacher networks and use it to synthesize data for KD. Previous DFKD methods faced challenges due to the large search space of data generation. Recently, deep generative models (DGMs) have been proposed to learn data distribution using deep networks, which provides an efficient way to reduce the search space by generating a set of pseudo data. In this paper, we explore the performance of KD trained using pseudo samples generated by pretrained DGMs and find that the correlation with image quality is not always positive. Based on this observation, we propose a new DFKD framework called Generative Knowledge Distillation (GenKD) that reduces the search space by constructing a prior distribution modeled by DGMs for their power of likelihood estimation. Specifically, we use energy-based models (EBM) to generate data from the Maximum Likelihood Estimation (MLE) of the EBM and gradients from downstream KD tasks by policy gradient. We then train the student model using the pretrained teacher model and pseudo samples. We also implement our GenKD framework on several widely-used benchmarks, including CIFAR100, CIFAR10, and SVHN. Our experiments demonstrate that we can generate high-quality pseudo samples quantitatively and qualitatively using GenKD. Additionally, the top-1 accuracy of the student network can approach state-of-the-art (SOTA) DFKD methods trained using fewer pseudo samples and less generation time.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据