4.7 Article

Multi-modal graph contrastive encoding for neural machine translation

Journal

ARTIFICIAL INTELLIGENCE
Volume 323, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.artint.2023.103986

Keywords

Multi -modal neural machine translation; Graph neural networks; Contrastive learning

Ask authors/readers for more resources

This paper proposes a graph-based multi-modal fusion encoder for neural machine translation. It represents the input sentence and image using a unified multi-modal graph, and learns node representations through graph-based fusion layers. Experimental results show that the proposed model achieves significant improvements over baselines and achieves state-of-the-art performance on the Multi30K dataset.
As an important extension of conventional text-only neural machine translation (NMT), multi-modal neural machine translation (MNMT) aims to translate input source sentences paired with images into the target language. Although a lot of MNMT models have been proposed to perform multi-modal semantic fusion, they do not consider fine-grained semantic correspondences between semantic units of different modalities (i.e., words and visual objects), which can be exploited to refine multi-modal representation learning via fine-grained semantic interactions. To address this issue, we propose a graph-based multi -modal fusion encoder for NMT. Concretely, we first employ a unified multi-modal graph to represent the input sentence and image, in which the multi-modal semantic units are considered as the nodes in the graph, connected by two kinds of edges with different semantic relationships. Then, we stack multiple graph-based multi-modal fusion layers that iteratively conduct intra-and inter-modal interactions to learn node representations. Finally, via an attention mechanism, we induce a multi-modal context from the top node representations for the decoder. Particularly, we introduce a progressive contrastive learning strategy based on the multi-modal graph to refine the training of our proposed model, where hard negative samples are introduced gradually. To evaluate our model, we conduct experiments on commonly-used datasets. Experimental results and analysis show that our MNMT model obtains significant improvements over competitive baselines, achieving state-of-the-art performance on the Multi30K dataset.& COPY; 2023 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available