4.6 Article

Reusing Monolingual Pre-Trained Models by Cross-Connecting Seq2seq Models for Machine Translation

期刊

APPLIED SCIENCES-BASEL
卷 11, 期 18, 页码 -

出版社

MDPI
DOI: 10.3390/app11188737

关键词

natural language processing; transfer learning; neural machine translation

资金

  1. National Research Foundation of Korea(NRF) - Korea government(MSIT) [2018R1A5A7059549, 2020R1A2C1014037]
  2. Institute of Information & communications Technology Planning & Evaluation (IITP) - Korea government (MSIT) (Artificial Intelligence Graduate School Program (Hanyang University)) [2020-0-01373]
  3. National Research Foundation of Korea [2020R1A2C1014037] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

向作者/读者索取更多资源

This work utilizes sequence-to-sequence models pre-trained on monolingual corpora for machine translation. By combining source and target language models and adding an intermediate layer, it significantly improves translation performance. The study also analyzes the importance of various components in the pre-trained models and their performance changes with the size of bitext.
This work uses sequence-to-sequence (seq2seq) models pre-trained on monolingual corpora for machine translation. We pre-train two seq2seq models with monolingual corpora for the source and target languages, then combine the encoder of the source language model and the decoder of the target language model, i.e., the cross-connection. We add an intermediate layer between the pre-trained encoder and the decoder to help the mapping of each other since the modules are pre-trained completely independently. These monolingual pre-trained models can work as a multilingual pre-trained model because one model can be cross-connected with another model pre-trained on any other language, while their capacity is not affected by the number of languages. We will demonstrate that our method improves the translation performance significantly over the random baseline. Moreover, we will analyze the appropriate choice of the intermediate layer, the importance of each part of a pre-trained model, and the performance change along with the size of the bitext.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据