4.7 Article

Modality-Invariant Asymmetric Networks for Cross-Modal Hashing

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2022.3144352

关键词

Semantics; Binary codes; Electronic mail; Training; Representation learning; Measurement; Feature extraction; Deep asymmetric learning; modality-alignment network; binary code learning; cross-modal hashing

向作者/读者索取更多资源

In this paper, we propose a novel Modality-Invariant Asymmetric Networks (MIAN) architecture that explores the asymmetric intra- and inter-modal similarity preservation under a probabilistic modality alignment framework. The MIAN approach outperforms the state-of-the-art cross-modal hashing methods in terms of performance. The proposed approach incorporates pairwise, piecewise, and transformed semantics into a unified semantic-preserving hash codes learning scheme.
Cross-modal hashing has garnered considerable attention and gained great success in many cross-media similarity search applications due to its prominent computational efficiency and low storage overhead. However, it still remains challenging how to effectively take multilevel advantages of semantics on the entire database to jointly bridge the semantic and heterogeneity gaps across different modalities. In this paper, we propose a novel Modality-Invariant Asymmetric Networks (MIAN) architecture, which explores the asymmetric intra- and inter-modal similarity preservation under a probabilistic modality alignment framework. Specifically, an intra-modal asymmetric network is conceived to capture the query-vs-all internal pairwise similarities for each modality in a probabilistic asymmetric learning manner. Moreover, an inter-modal asymmetric network is deployed to fully harness the cross-modal semantic similarities supported by the maximum inner product search formula between two distinct hash embeddings. Particularly, the pairwise, piecewise and transformed semantics are jointly considered into one unified semantic-preserving hash codes learning scheme. Furthermore, we construct a modality alignment network to distill the redundancy-free visual features and maximize the conditional bottleneck information between different modalities. Such a network could close the heterogeneity and domain shift across different modalities and enable it to yield discriminative modality-invariant hash codes. Extensive experiments evidence that our MIAN approach can outperform the state-of-the-art cross-modal hashing methods.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据