4.7 Article

Variational multimodal machine translation with underlying semantic

Journal

INFORMATION FUSION
Volume 69, Issue -, Pages 73-80

Publisher

ELSEVIER
DOI: 10.1016/j.inffus.2020.11.011

Keywords

Machine translation; Variational neural machine translation; Multimodal learning

Funding

  1. National Natural Science Foundation of China [62076096]
  2. Shanghai Municipal Project, China [20511100900]
  3. Shanghai Knowledge Service Platform, China Project [ZF1213]

Ask authors/readers for more resources

This paper introduces a variational multimodal machine translation (VMMT) model that utilizes visual and textual information to simulate uncertainty in language. The model employs multitask learning to reduce the gap in semantic representation between different modes and applies the information bottleneck theory to filter redundancy.
Capturing the underlying semantic relationships of sentences is helpful for machine translation. Variational neural machine translation approaches provide an effective way to model the uncertain underlying semantics in languages by introducing latent variables. Multitask learning is applied in multimodal machine translation to integrate multimodal data. However, these approaches usually lack a strong interpretation in utilizing out of-text information in machine translation tasks. In this paper, we propose a novel architecture-free multimodal translation model, called variational multimodal machine translation (VMMT), under the variational framework which can model the uncertainty in languages caused by ambiguity through utilizing visual and textual information. In addition, the proposed model can eliminate the discrepancy between training and prediction in the existing variational translation models by constructing encoders only relying on source data. More importantly, the proposed multimodal translation model is designed as multitask learning in which the shared semantic representation for different modes is learned and the gap among semantic representation from various modes is reduced by incorporating additional constraints. Moreover, the information bottleneck theory is adopted in our variational encoder?decoder model, which helps the encoder to filter redundancy and the decoder to concentrate on useful information. Experiments on multimodal machine translation demonstrate that the proposed model is competitive. Capturing the underlying semantic relationships of sentences is helpful for machine translation. Variational neural machine translation approaches provide an effective way to model the uncertain underlying semantics in languages by introducing latent variables. Multitask learning is applied in multimodal machine translation to integrate multimodal data. However, these approaches usually lack a strong interpretation in utilizing out of-text information in machine translation tasks. In this paper, we propose a novel architecture-free multimodal translation model, called variational multimodal machine translation (VMMT), under the variational framework which can model the uncertainty in languages caused by ambiguity through utilizing visual and textual information. In addition, the proposed model can eliminate the discrepancy between training and prediction in the existing variational translation models by constructing encoders only relying on source data. More importantly, the proposed multimodal translation model is designed as multitask learning in which the shared semantic representation for different modes is learned and the gap among semantic representation from various modes is reduced by incorporating additional constraints. Moreover, the information bottleneck theory is adopted in our variational encoder?decoder model, which helps the encoder to filter redundancy and the decoder to concentrate on useful information. Experiments on multimodal machine translation demonstrate that the proposed model is competitive.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available