4.6 Article

A neural image captioning model with caption-to-images semantic constructor

期刊

NEUROCOMPUTING
卷 367, 期 -, 页码 144-151

出版社

ELSEVIER
DOI: 10.1016/j.neucom.2019.08.012

关键词

Image captioning; Semantic reconstructor; Reranking

资金

  1. National Natural Science Foundation of China [61672440]
  2. Fundamental Research Funds for the Central Universities [ZK1024]
  3. Scientific Research Project of National Language Committee of China [YB135-49]
  4. Beijing Advanced Innovation Center for Language Resources

向作者/读者索取更多资源

The current dominant image captioning models are mostly based on a CNN-LSTM encoder-decoder framework. Although this architecture has achieved remarkable progress, it still has shortcomings for not fully capturing the encoded image information. Specifically, the model only exploits image-to-caption dependency during the process of caption generation. In this paper, we extend the conventional CNN-LSTM image captioning model by introducing a caption-to-images semantic reconstructor, which reconstructs the semantic representations of the input image and its similar images from hidden states of the decoder. Serving as an auxiliary objective that evaluates the fidelity of the generated caption, the reconstruction score of semantic reconstructor is combined with the likelihood to refine model training. In this way, semantics of input image can be more effectively transferred to the decoder and be fully exploited to generate better captions. Besides, during model testing, the reconstruction score can be used along with log likelihood to select better caption via reranking. Experimental results show that the proposed model significantly improves the quality of the generated captions and outperforms a conventional image captioning model, LSTM-A(5). (C) 2019 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据