4.7 Article

MS(2)GAH: Multi-label semantic supervised graph attention hashing for robust cross-modal retrieval

期刊

PATTERN RECOGNITION
卷 128, 期 -, 页码 -

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2022.108676

关键词

Cross-modal retrieval; Deep hashing; Graph attention network

资金

  1. Shandong Provincial Natural Science Foundation, China [ZR2020MF006]
  2. Industry-university Research Innovation Foundation of Ministry of Education of China [2021FNA01001]
  3. Major Scientific and Technological Projects of CNPC [ZD2019-183-006]
  4. Open Foundation of State Key Laboratory of Integrated Services Networks (Xidian University) [ISN23-09]

向作者/读者索取更多资源

This study proposes a novel deep hashing method called MS(2)GAH, which integrates Graph Attention Networks (GATs) to establish cross-modal hashing. It utilizes multi-label annotations to enhance semantic relevance between modalities, and uses an end-to-end label encoder to guide feature extraction of specific-modality networks, narrowing the modality gap.
Due to the strong nonlinear representation capabilities of deep neural networks and the low storage and high efficiency characteristics of hash learning, deep cross-modal hashing has been propelled to the forefront of academics. How to preferably bridge semantic relevance to further bridge the semantic modality gap is the vital bottleneck to improve model performance. Confronting samples with rich semantics, how to comprehensively explore the hidden correlations and establish more precise modality relationships is the primary issue to be solved. In this work, we propose a novel deep hashing method called Multi-Label Semantic Supervised Graph Attention Hashing (MS(2)GAH), which is an end-to-end framework that integrates graph attention networks (GATs). It constructs graph features through the adjacency of nodes and assigns different weights to adjacent edges to enhance the robustness of the model. Simultaneously, multi-label annotations are utilized to bridge the semantic relevance between modalities in a more finegrained manner. To make preferable use of rich semantic information, an end-to-end label encoder is designed to mine high-level semantics from multi-label annotations to guide the feature extraction process of specific-modality networks, thereby further narrowing the modality gap. Finally, extensive experiments have been conducted on four datasets, and the results show that MS2GAH is superior to other baselines and one step forward. (C)& nbsp;2022 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据