4.7 Article

Unsupervised Deep Relative Neighbor Relationship Preserving Cross-Modal Hashing

期刊

MATHEMATICS
卷 10, 期 15, 页码 -

出版社

MDPI
DOI: 10.3390/math10152644

关键词

cross-modal retrieval; image-text retrieval; cross-modal similarity preserving; hashing algorithm; unsupervised learning

资金

  1. National Natural Science Foundation of China [61841602]
  2. Natural Science Foundation of Shandong Province of China [ZR2018PF005, ZR2021MF017]
  3. Youth Innovation Science and Technology Team Foundation of Shandong Higher School [2021KJ031]
  4. Fundamental Research Funds for the Central Universities, JLU [93K172021K12]

向作者/读者索取更多资源

This paper introduces the task of image-text cross-modal retrieval and the proposed DRNPH method, which achieves cross-modal retrieval in the Hamming space, with constraints for consistent binary codes of similar sample pairs and minimal Hamming distances. Experimental results show that this method outperforms existing methods in various image-text retrieval scenarios.
The image-text cross-modal retrieval task, which aims to retrieve the relevant image from text and vice versa, is now attracting widespread attention. To quickly respond to the large-scale task, we propose an Unsupervised Deep Relative Neighbor Relationship Preserving Cross-Modal Hashing (DRNPH) to achieve cross-modal retrieval in the common Hamming space, which has the advantages of storage and efficiency. To fulfill the nearest neighbor search in the Hamming space, we demand to reconstruct both the original intra- and inter-modal neighbor matrix according to the binary feature vectors. Thus, we can compute the neighbor relationship among different modal samples directly based on the Hamming distances. Furthermore, the cross-modal pair-wise similarity preserving constraint requires the similar sample pair have an identical Hamming distance to the anchor. Therefore, the similar sample pairs own the same binary code, and they have minimal Hamming distances. Unfortunately, the pair-wise similarity preserving constraint may lead to an imbalanced code problem. Therefore, we propose the cross-modal triplet relative similarity preserving constraint, which demands the Hamming distances of similar pairs should be less than those of dissimilar pairs to distinguish the samples' ranking orders in the retrieval results. Moreover, a large similarity marginal can boost the algorithm's noise robustness. We conduct the cross-modal retrieval comparative experiments and ablation study on two public datasets, MIRFlickr and NUS-WIDE, respectively. The experimental results show that DRNPH outperforms the state-of-the-art approaches in various image-text retrieval scenarios, and all three proposed constraints are necessary and effective for boosting cross-modal retrieval performance.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据