☆ 4.5 Article

A New Attention-Based LSTM for Image Captioning

NEURAL PROCESSING LETTERS (2022)

Journal

NEURAL PROCESSING LETTERS

Volume 54, Issue 4, Pages 3157-3171

Publisher

SPRINGER

DOI: 10.1007/s11063-022-10759-z

Keywords

Image caption; Attention; Long short-term memory; Deep learning

Funding

National Science and Technology Major Project [2020YFA0713504]
CERNET Innovation Project [NGII20180309]
Scientific Research Fund of Hunan Provincial Education Department [210153]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper proposes an attentional LSTM (ALSTM) for image captioning. Unlike traditional LSTM, ALSTM can refine input vector by learning from network hidden states and sequential context information. ALSTM is used as the decoder in some classical frameworks and demonstrates how to obtain effective visual/context attention. Extensive evaluations show the superiority of ALSTM in generating high-quality image descriptions.

Image captioning aims to describe the content of an image with a complete and natural sentence. Recently, the image captioning methods with encoder-decoder architecture has made great progress, in which LSTM became a dominant decoder to generate word sequence. However, in the decoder stage, the input vector keep same and there is much uncorrelated with previously visual parts or generated words. In this paper, we propose an attentional LSTM (ALSTM) and show how to integrate it within state-of-the-art automatic image captioning framework. Instead of traditional LSTM in existing models, ALSTM learns to refine input vector from network hidden states and sequential context information. Thus ALSTM can attend more relevant features such as spatial attention, visual relations and pay more attention on the most relevant context words. Moreover, ALSTM is utilized as the decoder in some classical frameworks and shows how to get effective visual/context attention to update input vector. Extensive quantitative and qualitative evaluations on the Flickr30K and MSCOCO image datasets with modified network illustrate the superiority of ALSTM. ALSTM based methods can generate high quality descriptions by combining sequence context and relations.

A New Attention-Based LSTM for Image Captioning

Journal

NEURAL PROCESSING LETTERS

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A New Attention-Based LSTM for Image Captioning

Journal

NEURAL PROCESSING LETTERS

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper