4.6 Article

Scene graph captioner: Image captioning based on structural visual representation

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jvcir.2018.12.027

关键词

Image captioning; Scene graph; Structural representation; Attention

资金

  1. National Natural Science Foundation of China [61772359, 61472275, 61502337, 61701341]

向作者/读者索取更多资源

While deep neural networks have recently achieved promising results on the image captioning task, they do not explicitly use the structural visual and textual knowledge within an image. In this work, we propose the Scene Graph Captioner (SGC) framework for the image captioning task, which captures the comprehensive structural semantic of visual scene by explicitly modeling objects, attributes of objects, and relationships between objects. Firstly, we develop an approach to generate the scene graph by learning individual modules on the large object, attribute and relationship datasets. Then, SGC incorporates high-level graph information and visual attention information into a deep captioning framework. Specifically, we propose a novel framework to embed a scene graph into the structural representation, which captures the semantic concepts and the graph topology. Further, we develop the scene-graph-driven method to generate the attention graph by exploiting high internal homogeneity and external inhomogeneity among the nodes in the scene graph. Finally, a LSTM-based framework translates these information into text. We evaluate the proposed framework on a held-out MSCOCO dataset. (C) 2018 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据