4.6 Article

Deep sequential fusion LSTM network for image description

Journal

NEUROCOMPUTING
Volume 312, Issue -, Pages 154-164

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2018.05.086

Keywords

Image description; Long short term memory network; Layer-wise optimization; Deep supervision; Deep sequential fusion

Funding

  1. National Natural Science Foundation of China [61622115, 61472281]
  2. Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning [GZ2015005]
  3. Shanghai Engineering Research Center of Industrial Vision Perception & Intelligent Computing [17DZ2251600]
  4. IBM Shared University Research Awards Program
  5. Scientific Research Foundation of Education Bureau of Jiangxi Province [GJJ170643]

Ask authors/readers for more resources

It is a challenging task to perform automatic image description, which aims to translate an image with visual information into natural language conforming to certain proper grammars and sentence structures. In this work, an optimal learning framework called deep sequential fusion based long short term memory network is designed. In the proposed framework, a layer-wise strategy is introduced into the generation process of recurrent neural network to increase the depth of language model for producing more abstract and discriminative features. Then, a deep supervision method is developed to enrich the model capacity with extra regularization. Moreover, the prediction scores from all of the auxiliary branches in the language model are employed to fuse the final decision output with product rule, which further makes use of the optimized model parameters and hence boosts the performance. The experimental results on two public benchmark datasets verify the effectiveness of the proposed approaches, with the consensus-based image description evaluation metric (CIDEr) being 103.4 on the MSCOCO dataset and the metric for evaluation of translation with explicit ordering (METEOR) reaching to 20.6 on the Flickr30K dataset. (C) 2018 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available