☆ 4.7 Article

Boosting convolutional image captioning with semantic content and visual relationship

DISPLAYS (2021)

期刊

DISPLAYS

卷 70, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.displa.2021.102069

关键词

Image captioning; Generative adversarial network; Graph convolution network

类别

Computer Science, Hardware & Architecture Engineering, Electrical & Electronic Instruments & Instrumentation Optics

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

A framework using CNN and CGAN, as well as MGCN model, was proposed for image caption generation, which can better utilize visual relationships between objects and generate captions with semantic meanings.

Image captioning aims to display automatically the natural language sentence for the image by the computer, which is an important but a challenging task which covers the fields of computer vision and natural language processing. This task is dominated by Long-short term memory (LSTM) based solutions. Although many progresses have been made based on LSTM in recent years, the model based on LSTM relies on serialized generation of descriptions, which cannot be processed in parallel and pay less attentions to the hierarchical structure of the captions. In order to solve this problem, we propose a framework using a CNN-based generation model to generate image captions with the help of conditional generative adversarial training (CGAN). Furthermore, multi-modal graph convolution network(MGCN) is used to exploit visual relationships between objects for generating the captions with semantic meanings, in which the scene graph is used as the bridge to connect objects, attributes and visual relationship information together to generate better captions. Extensive experiments are conducted on MSCOCO database and the results show that our method could achieve better or comparable scores compared with state-of-the-art methods. Ablation experimental results show that CGAN and MGCN can reflect a better visual relationships between objects in image and thus can generate better captions with richer semantic content.

Boosting convolutional image captioning with semantic content and visual relationship

期刊

DISPLAYS

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Boosting convolutional image captioning with semantic content and visual relationship

期刊

DISPLAYS

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文