☆ 4.6 Article

Attention-guided image captioning with adaptive global and local feature fusion

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION (2021)

期刊

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION

卷 78, 期 -, 页码 -

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jvcir.2021.103138

关键词

Image captioning; Encoder-decoder; Spatial information; Adaptive attention

类别

Computer Science, Information Systems Computer Science, Software Engineering

资金

Fundamental Research Funds for the Central Universities of China [191010001]
Hubei Key Laboratory of Transportation Internet of Things [2018IOT003, 2020III026GX]
Ministry of Science and Technology, Taiwan [MOST 109-2634-F-007-013]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The proposed image captioning scheme based on adaptive spatial information attention (ASIA) effectively extracts spatial information of salient objects, utilizes different techniques in encoding and decoding stages, improving captioning performance according to extensive experiments on two datasets.

Although attention mechanisms are exploited widely in encoder-decoder neural network-based image captioning framework, the relation between the selection of salient image regions and the supervision of spatial information on local and global representation learning was overlooked, thereby degrading captioning performance. Consequently, we propose an image captioning scheme based on adaptive spatial information attention (ASIA), extracting a sequence of spatial information of salient objects in a local image region or an entire image. Specifically, in the encoding stage, we extract the object-level visual features of salient objects and their spatial bounding-box. We obtain the global feature maps of an entire image, which are fused with local features and the fused features are fed into the LSTM-based language decoder. In the decoding stage, our adaptive attention mechanism dynamically selects the corresponding image regions specified by an image description. Extensive experiments conducted on two datasets demonstrate the effectiveness of the proposed method.

Attention-guided image captioning with adaptive global and local feature fusion

期刊

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Attention-guided image captioning with adaptive global and local feature fusion

期刊

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文