☆ 4.6 Article

A neural image captioning model with caption-to-images semantic constructor

NEUROCOMPUTING (2019)

期刊

NEUROCOMPUTING

卷 367, 期 -, 页码 144-151

出版社

ELSEVIER

DOI: 10.1016/j.neucom.2019.08.012

关键词

Image captioning; Semantic reconstructor; Reranking

类别

Computer Science, Artificial Intelligence

资金

National Natural Science Foundation of China [61672440]
Fundamental Research Funds for the Central Universities [ZK1024]
Scientific Research Project of National Language Committee of China [YB135-49]
Beijing Advanced Innovation Center for Language Resources

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The current dominant image captioning models are mostly based on a CNN-LSTM encoder-decoder framework. Although this architecture has achieved remarkable progress, it still has shortcomings for not fully capturing the encoded image information. Specifically, the model only exploits image-to-caption dependency during the process of caption generation. In this paper, we extend the conventional CNN-LSTM image captioning model by introducing a caption-to-images semantic reconstructor, which reconstructs the semantic representations of the input image and its similar images from hidden states of the decoder. Serving as an auxiliary objective that evaluates the fidelity of the generated caption, the reconstruction score of semantic reconstructor is combined with the likelihood to refine model training. In this way, semantics of input image can be more effectively transferred to the decoder and be fully exploited to generate better captions. Besides, during model testing, the reconstruction score can be used along with log likelihood to select better caption via reranking. Experimental results show that the proposed model significantly improves the quality of the generated captions and outperforms a conventional image captioning model, LSTM-A(5). (C) 2019 Elsevier B.V. All rights reserved.

A neural image captioning model with caption-to-images semantic constructor

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A neural image captioning model with caption-to-images semantic constructor

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文