☆ 4.7 Article

Re-Caption: Saliency-Enhanced Image Captioning Through Two-Phase Learning

IEEE TRANSACTIONS ON IMAGE PROCESSING (2020)

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING

Volume 29, Issue -, Pages 694-709

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TIP.2019.2928144

Keywords

Visualization; Semantics; Adaptation models; Computational modeling; Predictive models; Task analysis; Fans; Image captioning; robust estimation; saliency; salient region detection; two-phase learning; visual attribute

Funding

National Natural Science Foundation of China [61572140, 61976057]
Shanghai Municipal RD Foundation [17DZ1100504, 16JC1420401]
Shanghai Natural Science Foundation [19ZR1417200]
Humanities and Social Sciences Planning Fund of Ministry of Education of China [19YJA630116]
Henry Tippie Endowed Chair Fund from The University of Iowa

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Visual saliency and semantic saliency are important in image captioning. However, a single-phase image captioning model benefits little from limited saliency information without a saliency predictor. In this paper, a novel saliency-enhanced re-captioning framework via two-phase learning is proposed to enhance single-phase image captioning. In the framework, both visual and semantic saliency cues are distilled from the first-phase model and fused with the second-phase model for model self-boosting. The visual saliency mechanism can generate a saliency map and a saliency mask for an image without learning a saliency predictor. The semantic saliency mechanism sheds some lights on the properties of those words with the part-of-speech Noun in a caption. Besides, another type of saliency, sample saliency is proposed to compute the saliency degree of each sample, which is helpful for more robust image captioning. In addition, how to combine the three types of saliency for further performance boost is also examined. Our framework can treat an image captioning model as a saliency extractor, which may benefit other captioning models and the related tasks. The experimental results on both the Flickr30k and MSCOCO datasets show that the saliency-enhanced models can obtain promising performance gains.

Re-Caption: Saliency-Enhanced Image Captioning Through Two-Phase Learning

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Re-Caption: Saliency-Enhanced Image Captioning Through Two-Phase Learning

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper