☆ 4.5 Article

Multilingual Denoising Pre-training for Neural Machine Translation

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (2020)

Journal

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS

Volume 8, Issue -, Pages 726-742

Publisher

MIT PRESS

DOI: 10.1162/tacl_a_00343

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART-a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective (Lewis et al., 2019). mBART is the first method for pre-training a complete sequence-to-sequencemodel by denoising full texts in multiple languages, whereas previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine-tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task-specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU points for low resource MT and over 5 BLEU points for many document-level and unsupervised models. We also show that it enables transfer to language pairs with no bi-text or that were not in the pre-training corpus, and present extensive analysis of which factors contribute the most to effective pre-training.

Multilingual Denoising Pre-training for Neural Machine Translation

Journal

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS

Publisher

MIT PRESS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Multilingual Denoising Pre-training for Neural Machine Translation

Journal

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS

Publisher

MIT PRESS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper