☆ 4.6 Article

Deep Generative Knowledge Distillation by Likelihood Finetuning

IEEE ACCESS (2023)

期刊

IEEE ACCESS

卷 11, 期 -, 页码 46441-46453

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2023.3273952

关键词

Data models; Training; Maximum likelihood estimation; Task analysis; Blockchains; Generators; Image quality; Knowledge engineering; Deep learning; INDEX TERMS; Generative adversarial networks; Knowledge distillation; deep generative model; image quality evaluation; data-free knowledge distillation

类别

Computer Science, Information Systems Engineering, Electrical & Electronic Telecommunications

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Knowledge Distillation (KD) is used to train smaller student models using a larger pretrained teacher model. Data-Free KD (DFKD) methods have been proposed to address privacy concerns in decentralized data systems like blockchain by extracting prior knowledge from teacher networks and synthesizing data for KD. This paper introduces Generative Knowledge Distillation (GenKD), a new DFKD framework that uses deep generative models (DGMs) to reduce the search space of data generation and achieve high-quality pseudo samples.

Knowledge Distillation (KD) is designed to train smaller student models using a larger pretrained teacher model. However, in decentralized data systems such as blockchain, privacy concerns may arise, making the data inaccessible. To address this issue, Data-Free KD (DFKD) methods have been proposed, which extract prior knowledge from teacher networks and use it to synthesize data for KD. Previous DFKD methods faced challenges due to the large search space of data generation. Recently, deep generative models (DGMs) have been proposed to learn data distribution using deep networks, which provides an efficient way to reduce the search space by generating a set of pseudo data. In this paper, we explore the performance of KD trained using pseudo samples generated by pretrained DGMs and find that the correlation with image quality is not always positive. Based on this observation, we propose a new DFKD framework called Generative Knowledge Distillation (GenKD) that reduces the search space by constructing a prior distribution modeled by DGMs for their power of likelihood estimation. Specifically, we use energy-based models (EBM) to generate data from the Maximum Likelihood Estimation (MLE) of the EBM and gradients from downstream KD tasks by policy gradient. We then train the student model using the pretrained teacher model and pseudo samples. We also implement our GenKD framework on several widely-used benchmarks, including CIFAR100, CIFAR10, and SVHN. Our experiments demonstrate that we can generate high-quality pseudo samples quantitatively and qualitatively using GenKD. Additionally, the top-1 accuracy of the student network can approach state-of-the-art (SOTA) DFKD methods trained using fewer pseudo samples and less generation time.

Deep Generative Knowledge Distillation by Likelihood Finetuning

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Deep Generative Knowledge Distillation by Likelihood Finetuning

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文