4.6 Article

Remote sensing image caption generation via transformer and reinforcement learning

Journal

MULTIMEDIA TOOLS AND APPLICATIONS
Volume 79, Issue 35-36, Pages 26661-26682

Publisher

SPRINGER
DOI: 10.1007/s11042-020-09294-7

Keywords

Transformer; Remote sensing image captioning; Attention mechanisms; Convolutional neural network; Reinforcement learning

Funding

  1. Fundamental Research Funds for the Central Universities, China [2017XKQY082]

Ask authors/readers for more resources

Image captioning is a task generating the natural semantic description of the given image, which plays an essential role for machines to understand the content of the image. Remote sensing image captioning is a part of the field. Most of the current remote sensing image captioning models failed to fully utilize the semantic information in images and suffered the overfitting problem induced by the small size of the dataset. To this end, we propose a new model using the Transformer to decode the image features to target sentences. For making the Transformer more adaptive to the remote sensing image captioning task, we additionally employ dropout layers, residual connections, and adaptive feature fusion in the Transformer. Reinforcement Learning is then applied to enhance the quality of the generated sentences. We demonstrate the validity of our proposed model on three remote sensing image captioning datasets. Our model obtains all seven higher scores on the Sydney Dataset and Remote Sensing Image Caption Dataset (RSICD), four higher scores on UCM dataset, which indicates that the proposed methods perform better than the previous state of the art models in remote sensing image caption generation.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available