☆ 4.6 Article

Scene graph captioner: Image captioning based on structural visual representation

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION (2019)

期刊

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION

卷 58, 期 -, 页码 477-485

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jvcir.2018.12.027

关键词

Image captioning; Scene graph; Structural representation; Attention

类别

Computer Science, Information Systems Computer Science, Software Engineering

资金

National Natural Science Foundation of China [61772359, 61472275, 61502337, 61701341]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

While deep neural networks have recently achieved promising results on the image captioning task, they do not explicitly use the structural visual and textual knowledge within an image. In this work, we propose the Scene Graph Captioner (SGC) framework for the image captioning task, which captures the comprehensive structural semantic of visual scene by explicitly modeling objects, attributes of objects, and relationships between objects. Firstly, we develop an approach to generate the scene graph by learning individual modules on the large object, attribute and relationship datasets. Then, SGC incorporates high-level graph information and visual attention information into a deep captioning framework. Specifically, we propose a novel framework to embed a scene graph into the structural representation, which captures the semantic concepts and the graph topology. Further, we develop the scene-graph-driven method to generate the attention graph by exploiting high internal homogeneity and external inhomogeneity among the nodes in the scene graph. Finally, a LSTM-based framework translates these information into text. We evaluate the proposed framework on a held-out MSCOCO dataset. (C) 2018 Elsevier Inc. All rights reserved.

Scene graph captioner: Image captioning based on structural visual representation

期刊

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Scene graph captioner: Image captioning based on structural visual representation

期刊

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文