☆ 4.6 Article

Adaptive multi-teacher multi-level knowledge distillation

NEUROCOMPUTING (2020)

Journal

NEUROCOMPUTING

Volume 415, Issue -, Pages 106-113

Publisher

ELSEVIER

DOI: 10.1016/j.neucom.2020.07.048

Keywords

Knowledge distillation; Adaptive learning; Multi-teacher

Funding

NSFC [61702190, 61672236, 61672231]
NSFC-Zhejiang [U1609220]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Knowledge distillation (KD) is an effective learning paradigm for improving the performance of light-weight student networks by utilizing additional supervision knowledge distilled from teacher networks. Most pioneering studies either learn from only a single teacher in their distillation learning methods, neglecting the potential that a student can learn from multiple teachers simultaneously, or simply treat each teacher to be equally important, unable to reveal the different importance of teachers for specific examples. To bridge this gap, we propose a novel adaptive multi-teacher multi-level knowledge distillation learning framework (AMTML-KD), which consists two novel insights: (i) associating each teacher with a latent representation to adaptively learn instance-level teacher importance weights which are leveraged for acquiring integrated soft-targets (high-level knowledge) and (ii) enabling the intermediate-level hints (intermediate-level knowledge) to be gathered from multiple teachers by the proposed multi-group hint strategy. As such, a student model can learn multi-level knowledge from multiple teachers through AMTML-KD. Extensive results on publicly available datasets demonstrate the proposed learning framework ensures student to achieve improved performance than strong competitors. (C) 2020 Elsevier B.V. All rights reserved.

Adaptive multi-teacher multi-level knowledge distillation

Journal

NEUROCOMPUTING

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Adaptive multi-teacher multi-level knowledge distillation

Journal

NEUROCOMPUTING

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper