4.7 Article

Learning multimodal word representation with graph convolutional networks

Journal

INFORMATION PROCESSING & MANAGEMENT
Volume 58, Issue 6, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.ipm.2021.102709

Keywords

Natural language processing; Word representation; Multimodal word representation; Graph convolutional network

Funding

  1. National Natural Science Foundation of China [61572434, 91630206]
  2. Shanghai Science and Technology Committee [19DZ2204800]
  3. National Key R&D Program of China [2017YFB0701501]

Ask authors/readers for more resources

Research has shown that multimodal models outperform text-based models in learning semantic word representations. Inspired by the relationship among language modalities and the advantages of graph convolution network (GCN) in extracting non-European spatial features, a new multimodal word representation model, GCNW, incorporates phonetic and syntactic information using GCN and updates modal-relation matrix with a greedy strategy. The model is trained through unsupervised learning and has shown superior performance in various NLP tasks compared to strong unimodal baselines and existing multimodal models, with the source code available for reproducible research.
Multimodal models have been proven to outperform text-based models on learning semantic word representations. According to psycholinguistic theory, there is a graphical relationship among the modalities of language, and in recent years, the graph convolution network (GCN) has been proven to have substantial advantages in the extraction of non-European spatial features. This inspires us to propose a new multimodal word representation model, namely, GCNW, which uses the graph convolutional network to incorporate the phonetic and syntactic information into the word representation. We use a greedy strategy to update the modality-relation matrix in the GCN, and we train the model through unsupervised learning. We evaluated the proposed model on multiple downstream NLP tasks, and various experimental results demonstrate that the GCNW outperforms strong unimodal baselines and state-of-the-art multimodal models. We make the source code of both models available to encourage reproducible research.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available