Journal
PATTERN RECOGNITION LETTERS
Volume 143, Issue -, Pages 43-49Publisher
ELSEVIER
DOI: 10.1016/j.patrec.2020.12.020
Keywords
Image captioning; Transformer; Knowledge graph
Categories
Funding
- National Key R&D Program of China [2018AAA0100100]
- National Natural Science Foundation of China [61702095]
- Natural Science Foundation of Jiangsu Province [BK20190341]
Ask authors/readers for more resources
This paper applies the Transformer model to image captioning tasks and improves its performance in two aspects by adding a KL divergence term and leveraging knowledge graphs. Experimental results on benchmark datasets show the effectiveness of the proposed method.
The Transformer model has achieved very good results in machine translation tasks. In this paper, we adopt the Transformer model for the image captioning task. To promote the performance of image captioning, we improve the Transformer model from two aspects. First, we augment the maximum likelihood estimation (MLE) with an extra Kullback-Leibler (KL) divergence term to distinguish the difference between incorrect predictions. Second, we introduce a method to help the Transformer model generate captions by leveraging the knowledge graph. Experiments on benchmark datasets demonstrate the effectiveness of our method. (c) 2021 Elsevier B.V. All rights reserved.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available