4.7 Article

CMPD: Using Cross Memory Network With Pair Discrimination for Image-Text Retrieval

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSVT.2020.3017344

Keywords

Semantics; Task analysis; Training; Generators; Optimization; Gallium nitride; Marine vehicles; Retrieval; cross-modal retrieval; adversarial learning

Funding

  1. National Key Research and Development Program of China [2018YFB0505400, 2019YFB1405703, TC190A4DA/3]
  2. NSFC [61672307]
  3. Tsinghua-Kuaishou Institute of Future Media Data

Ask authors/readers for more resources

A novel cross memory network with pair discrimination (CMPD) is proposed for image-text cross modal retrieval, demonstrating superior performance compared to state-of-the-art approaches. The method utilizes cross memory as a set of latent concepts and pair discrimination loss to capture semantic relationships efficiently.
Cross-modal retrieval using deep neural networks aims to retrieve relevant data between the two different modalities. The performance of cross-modal retrieval is still unsatisfactory for two problems. First, most of the previous methods failed to incorporate the common knowledge among modalities when predicting the item representations. Second, the semantic relationships indicated by class label are still insufficiently utilized, which is an important clue for inferring similarities between the cross modal items. To address the above issues, we propose a novel cross memory network with pair discrimination (CMPD) for image-text cross modal retrieval, where the main contributions lie in two-folds. First, we propose the cross memory as a set of latent concepts to capture the common knowledge among different modalities. It is learnable and can be fused into each modality through attention mechanism, which aims to discriminatively predict representations. Second, we propose the pair discrimination loss to discriminate modality labels and class labels of item pairs, which can efficiently capture the semantic relationships among these modality labels and class labels. Comprehensive experimental results show that our method outperforms the state-of-the-art approaches in image-text retrieval.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available