4.7 Article

CycleMatch: A cycle-consistent embedding network for image-text matching

Journal

PATTERN RECOGNITION
Volume 93, Issue -, Pages 365-379

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2019.05.008

Keywords

Image-text matching; Embedding; Deep neural networks; Late-fusion inference

Funding

  1. LIACS Media Lab at Leiden University [2006002026]
  2. National Natural Science Foundation of China [61872379]

Ask authors/readers for more resources

In numerous multimedia and multi-modal tasks from image and video retrieval to zero-shot recognition to multimedia question and answering, bridging image and text representations plays an important and in some cases an indispensable role. To narrow the modality gap between vision and language, prior approaches attempt to discover their correlated semantics in a common feature space. However, these approaches omit the intra-modal semantic consistency when learning the inter-modal correlations. To address this problem, we propose cycle-consistent embeddings in a deep neural network for matching visual and textual representations. Our approach named as CycleMatch can maintain both inter-modal correlations and intra-modal consistency by cascading dual mappings and reconstructed mappings in a cyclic fashion. Moreover, in order to achieve a robust inference, we propose to employ two late-fusion approaches: average fusion and adaptive fusion. Both of them can effectively integrate the matching scores of different embedding features, without increasing the network complexity and training time. In the experiments on cross-modal retrieval, we demonstrate comprehensive results to verify the effectiveness of the proposed approach. Our approach achieves state-of-the-art performance on two well-known multi modal datasets, Flickr30K and MSCOCO. (C) 2019 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available