☆ 4.7 Article

Adaptive Label-Aware Graph Convolutional Networks for Cross-Modal Retrieval

IEEE TRANSACTIONS ON MULTIMEDIA (2021)

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

卷 24, 期 -, 页码 3520-3532

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TMM.2021.3101642

关键词

Correlation; Semantics; Task analysis; Adaptation models; Adaptive systems; Birds; Oceans; Cross-modal retrieval; Deep learning; Graph convolutional networks

类别

Computer Science, Information Systems Computer Science, Software Engineering Telecommunications

资金

National Key Research and Development Program of China [2017YFB1002804]
National Natural Science Foundation of China [62036012, 62072456, 61720106006, 61572503, 61802405, 61872424, 61702509, 61832002, 61936005, U1705262]
Key Research Program of Frontier Sciences, CAS [QYZDJ-SSW-JSC039]
Open Research Projects of Zhejiang Laboratory [2021KE0AB05]
Tencent WeChat Rhino-Bird Focused Research Program

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this paper, a novel end-to-end adaptive label-aware graph convolutional network (ALGCN) is proposed for cross-modal retrieval, achieving modality-invariant and discriminative representations by designing instance and label representation learning branches. The ALGCN model outperforms state-of-the-art methods in cross-modal retrieval on benchmark datasets like NUS-WIDE, MIRFlickr and MS-COCO.

The cross-modal retrieval task has raised continuous attention in recent years with the increasing scale of multi-modal data, which has broad application prospects including multimedia data management and intelligent search engine. Most existing methods mainly project data of different modalities into a common representation space where label information is often exploited to distinguish samples from different semantic categories. However, they typically treat each label as an independent individual and ignore the underlying semantic structure of labels. In this paper, we propose an end-to-end adaptive label-aware graph convolutional network (ALGCN) by designing both the instance representation learning branch and the label representation learning branch, which can obtain modality-invariant and discriminative representations for cross-modal retrieval. Firstly, we construct an instance representation learning branch to transform instances of different modalities into a common representation space. Secondly, we adopt Graph Convolutional Network (GCN) to learn inter-dependent classifiers in the label representation learning branch. In addition, a novel adaptive correlation matrix is proposed to efficiently explore and preserve the semantic structure of labels in a data-driven manner. Together with a robust self-supervision loss for GCN, the GCN model can be supervised to learn an effective and robust correlation matrix for feature propagation. Comprehensive experimental results on three benchmark datasets, NUS-WIDE, MIRFlickr and MS-COCO, demonstrate the superiority of ALGCN, compared with the state-of-the-art methods in cross-modal retrieval.

Adaptive Label-Aware Graph Convolutional Networks for Cross-Modal Retrieval

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Adaptive Label-Aware Graph Convolutional Networks for Cross-Modal Retrieval

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文