☆ 4.6 Article

Boosted Transformer for Image Captioning

APPLIED SCIENCES-BASEL (2019)

Journal

APPLIED SCIENCES-BASEL

Volume 9, Issue 16, Pages -

Publisher

MDPI

DOI: 10.3390/app9163260

Keywords

image captioning; self-attention; deep learning; transformer

Funding

National Nature Science Foundation of China [61671054]
Beijing Natural Science Foundation [4182038]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Image captioning attempts to generate a description given an image, usually taking Convolutional Neural Network as the encoder to extract the visual features and a sequence model, among which the self-attention mechanism has achieved advanced progress recently, as the decoder to generate descriptions. However, this predominant encoder-decoder architecture has some problems to be solved. On the encoder side, without the semantic concepts, the extracted visual features do not make full use of the image information. On the decoder side, the sequence self-attention only relies on word representations, lacking the guidance of visual information and easily influenced by the language prior. In this paper, we propose a novel boosted transformer model with two attention modules for the above-mentioned problems, i.e., Concept-Guided Attention (CGA) and Vision-Guided Attention (VGA). Our model utilizes CGA in the encoder, to obtain the boosted visual features by integrating the instance-level concepts into the visual features. In the decoder, we stack VGA, which uses the visual information as a bridge to model internal relationships among the sequences and can be an auxiliary module of sequence self-attention. Quantitative and qualitative results on the Microsoft COCO dataset demonstrate the better performance of our model than the state-of-the-art approaches.

Boosted Transformer for Image Captioning

Journal

APPLIED SCIENCES-BASEL

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Boosted Transformer for Image Captioning

Journal

APPLIED SCIENCES-BASEL

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper