4.4 Review

A thorough review of models, evaluation metrics, and datasets on image captioning

Journal

IET IMAGE PROCESSING
Volume 16, Issue 2, Pages 311-332

Publisher

WILEY
DOI: 10.1049/ipr2.12367

Keywords

-

Funding

  1. HigherAgriculturalCollege Branch of theChina EducationalTechnologyAssociation [C21ZD05]

Ask authors/readers for more resources

This survey provides a comprehensive overview of image captioning methods, categorizing them based on techniques and discussing their advantages and limitations. By quantitatively comparing related state-of-the-art studies, recent trends and future directions in image captioning are determined. The ultimate goal is to serve as a tool for understanding existing literature and highlighting future directions in the field for the benefit of Computer Vision and Natural Language Processing communities.
Image captioning means generate descriptive sentences from a query image automatically. It has recently received widespread attention from the computer vision and natural language processing communities as an emerging visual task. Currently, both components have evolved considerably by exploiting object regions, attributes, attention mechanism methods, entity recognition with novelties, and training strategies. However, despite the impressive results, the research has not yet come to a conclusive answer. This survey aims to provide a comprehensive overview of image captioning methods, from technical architectures to benchmark datasets, evaluation metrics, and comparison of state-of-the-art methods. In particular, image captioning methods are divided into different categories based on the technique adopted. Representative methods in each class are summarized, and their advantages and limitations are discussed. Moreover, many related state-of-the-art studies were quantitatively compared to determine the recent trends and future directions in image captioning. The ultimate goal of this work is to serve as a tool for understanding the existing literature and highlighting future directions in the area of image captioning for Computer Vision and Natural Language Processing communities may benefit from.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available