4.5 Article

Multi-label adversarial fine-grained cross-modal retrieval

期刊

出版社

ELSEVIER
DOI: 10.1016/j.image.2023.117018

关键词

Common representation; Transformer; Adversarial learning; Cross-modal retrieval

向作者/读者索取更多资源

In this paper, a novel Multi-label Adversarial Fine-grained Cross-modal Retrieval Based on Transformer (MLAT) method is proposed to bridge the semantic gap and eliminate modal specific features. The method constructs a semantic consistency enhanced module and a multi-stage adversarial learning module to optimize feature representations.
Most supervised cross-modal approaches transform features into a common representation space in which semantic similarity can be measured directly. However, there exist modal specific features in the common semantic space and most methods cannot fully eliminate them. In order to bridge the semantic gap and eliminate modal specific features, we propose a novel Multi-label Adversarial Fine-grained Cross-modal Retrieval Based on Transformer (MLAT). MLAT constructs a semantic consistency enhanced module (SCE) which includes the semantic mask attention module and a fine-grained feature generator based on transformer. It learns fine-grained semantic information to preserve the high-level semantic relevance and eliminate modal specific features. In order to narrow the distance between common representations and further eliminate modal specific features, we construct a multi-stage adversarial learning module to optimize feature representations. Furthermore, we design a label graph network based on graph attention network (GAT) to better explore the semantic correlations between labels and learn a classifier. Three benchmark datasets are synthesized to demonstrate the superiority of MLAT method.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据