4.6 Article

Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation

Journal

IEEE ACCESS
Volume 8, Issue -, Pages 206638-206645

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2020.3037821

Keywords

Training; Decoding; Vocabulary; Task analysis; Standards; Knowledge engineering; Computational modeling; Neural machine translation; knowledge distillation; prior knowledge

Funding

  1. Western Light of the Chinese Academy of Sciences [2017-XBQNXZ-A-005]
  2. National Natural Science Foundation of China [U1703133]
  3. Subsidy of the Youth Innovation Promotion Association of the Chinese Academy of Sciences [2017472]
  4. Major Science and Technology Project of Xinjiang Uygur Autonomous Region [2016A03007-3]
  5. Tianshan Excellent Young Scholars of Xinjiang [2019Q031]

Ask authors/readers for more resources

Knowledge Distillation (KD) aims to distill the knowledge of a cumbersome teacher model into a lightweight student model. Its success is generally attributed to the privileged information on similarities among categories provided by the teacher model, and in this sense, only strong teacher models are deployed to teach weaker students in practice. However, in low-resource neural machine translation, a stronger teacher model is not available. To counteract this, We therefore propose a novel Teacher-free Knowledge Distillation framework for low-resource neural machine translation, where the model learns from manually designed regularization distribution as a virtual teacher model. The prior distribution of artificial design can not only obtain the similarity information between words, but also provide effective regularity for model training. Experimental results show that the proposed method has improved performance in low-resource language effectively.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available