4.5 Article

Image captioning with transformer and knowledge graph

Journal

PATTERN RECOGNITION LETTERS
Volume 143, Issue -, Pages 43-49

Publisher

ELSEVIER
DOI: 10.1016/j.patrec.2020.12.020

Keywords

Image captioning; Transformer; Knowledge graph

Funding

  1. National Key R&D Program of China [2018AAA0100100]
  2. National Natural Science Foundation of China [61702095]
  3. Natural Science Foundation of Jiangsu Province [BK20190341]

Ask authors/readers for more resources

This paper applies the Transformer model to image captioning tasks and improves its performance in two aspects by adding a KL divergence term and leveraging knowledge graphs. Experimental results on benchmark datasets show the effectiveness of the proposed method.
The Transformer model has achieved very good results in machine translation tasks. In this paper, we adopt the Transformer model for the image captioning task. To promote the performance of image captioning, we improve the Transformer model from two aspects. First, we augment the maximum likelihood estimation (MLE) with an extra Kullback-Leibler (KL) divergence term to distinguish the difference between incorrect predictions. Second, we introduce a method to help the Transformer model generate captions by leveraging the knowledge graph. Experiments on benchmark datasets demonstrate the effectiveness of our method. (c) 2021 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available