Journal
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
Volume 31, Issue 6, Pages 2427-2437Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSVT.2020.3017344
Keywords
Semantics; Task analysis; Training; Generators; Optimization; Gallium nitride; Marine vehicles; Retrieval; cross-modal retrieval; adversarial learning
Categories
Funding
- National Key Research and Development Program of China [2018YFB0505400, 2019YFB1405703, TC190A4DA/3]
- NSFC [61672307]
- Tsinghua-Kuaishou Institute of Future Media Data
Ask authors/readers for more resources
A novel cross memory network with pair discrimination (CMPD) is proposed for image-text cross modal retrieval, demonstrating superior performance compared to state-of-the-art approaches. The method utilizes cross memory as a set of latent concepts and pair discrimination loss to capture semantic relationships efficiently.
Cross-modal retrieval using deep neural networks aims to retrieve relevant data between the two different modalities. The performance of cross-modal retrieval is still unsatisfactory for two problems. First, most of the previous methods failed to incorporate the common knowledge among modalities when predicting the item representations. Second, the semantic relationships indicated by class label are still insufficiently utilized, which is an important clue for inferring similarities between the cross modal items. To address the above issues, we propose a novel cross memory network with pair discrimination (CMPD) for image-text cross modal retrieval, where the main contributions lie in two-folds. First, we propose the cross memory as a set of latent concepts to capture the common knowledge among different modalities. It is learnable and can be fused into each modality through attention mechanism, which aims to discriminatively predict representations. Second, we propose the pair discrimination loss to discriminate modality labels and class labels of item pairs, which can efficiently capture the semantic relationships among these modality labels and class labels. Comprehensive experimental results show that our method outperforms the state-of-the-art approaches in image-text retrieval.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available