☆ 4.7 Article

CMPD: Using Cross Memory Network With Pair Discrimination for Image-Text Retrieval

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2021)

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Volume 31, Issue 6, Pages 2427-2437

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCSVT.2020.3017344

Keywords

Semantics; Task analysis; Training; Generators; Optimization; Gallium nitride; Marine vehicles; Retrieval; cross-modal retrieval; adversarial learning

Funding

National Key Research and Development Program of China [2018YFB0505400, 2019YFB1405703, TC190A4DA/3]
NSFC [61672307]
Tsinghua-Kuaishou Institute of Future Media Data

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

A novel cross memory network with pair discrimination (CMPD) is proposed for image-text cross modal retrieval, demonstrating superior performance compared to state-of-the-art approaches. The method utilizes cross memory as a set of latent concepts and pair discrimination loss to capture semantic relationships efficiently.

Cross-modal retrieval using deep neural networks aims to retrieve relevant data between the two different modalities. The performance of cross-modal retrieval is still unsatisfactory for two problems. First, most of the previous methods failed to incorporate the common knowledge among modalities when predicting the item representations. Second, the semantic relationships indicated by class label are still insufficiently utilized, which is an important clue for inferring similarities between the cross modal items. To address the above issues, we propose a novel cross memory network with pair discrimination (CMPD) for image-text cross modal retrieval, where the main contributions lie in two-folds. First, we propose the cross memory as a set of latent concepts to capture the common knowledge among different modalities. It is learnable and can be fused into each modality through attention mechanism, which aims to discriminatively predict representations. Second, we propose the pair discrimination loss to discriminate modality labels and class labels of item pairs, which can efficiently capture the semantic relationships among these modality labels and class labels. Comprehensive experimental results show that our method outperforms the state-of-the-art approaches in image-text retrieval.

CMPD: Using Cross Memory Network With Pair Discrimination for Image-Text Retrieval

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

CMPD: Using Cross Memory Network With Pair Discrimination for Image-Text Retrieval

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper