☆ 4.6 Article

Remote sensing image caption generation via transformer and reinforcement learning

MULTIMEDIA TOOLS AND APPLICATIONS (2020)

Journal

MULTIMEDIA TOOLS AND APPLICATIONS

Volume 79, Issue 35-36, Pages 26661-26682

Publisher

SPRINGER

DOI: 10.1007/s11042-020-09294-7

Keywords

Transformer; Remote sensing image captioning; Attention mechanisms; Convolutional neural network; Reinforcement learning

Funding

Fundamental Research Funds for the Central Universities, China [2017XKQY082]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Image captioning is a task generating the natural semantic description of the given image, which plays an essential role for machines to understand the content of the image. Remote sensing image captioning is a part of the field. Most of the current remote sensing image captioning models failed to fully utilize the semantic information in images and suffered the overfitting problem induced by the small size of the dataset. To this end, we propose a new model using the Transformer to decode the image features to target sentences. For making the Transformer more adaptive to the remote sensing image captioning task, we additionally employ dropout layers, residual connections, and adaptive feature fusion in the Transformer. Reinforcement Learning is then applied to enhance the quality of the generated sentences. We demonstrate the validity of our proposed model on three remote sensing image captioning datasets. Our model obtains all seven higher scores on the Sydney Dataset and Remote Sensing Image Caption Dataset (RSICD), four higher scores on UCM dataset, which indicates that the proposed methods perform better than the previous state of the art models in remote sensing image caption generation.

Remote sensing image caption generation via transformer and reinforcement learning

Journal

MULTIMEDIA TOOLS AND APPLICATIONS

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Remote sensing image caption generation via transformer and reinforcement learning

Journal

MULTIMEDIA TOOLS AND APPLICATIONS

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper