☆ 4.5 Article

Multi-label adversarial fine-grained cross-modal retrieval

SIGNAL PROCESSING-IMAGE COMMUNICATION (2023)

期刊

SIGNAL PROCESSING-IMAGE COMMUNICATION

卷 117, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.image.2023.117018

关键词

Common representation; Transformer; Adversarial learning; Cross-modal retrieval

类别

Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this paper, a novel Multi-label Adversarial Fine-grained Cross-modal Retrieval Based on Transformer (MLAT) method is proposed to bridge the semantic gap and eliminate modal specific features. The method constructs a semantic consistency enhanced module and a multi-stage adversarial learning module to optimize feature representations.

Most supervised cross-modal approaches transform features into a common representation space in which semantic similarity can be measured directly. However, there exist modal specific features in the common semantic space and most methods cannot fully eliminate them. In order to bridge the semantic gap and eliminate modal specific features, we propose a novel Multi-label Adversarial Fine-grained Cross-modal Retrieval Based on Transformer (MLAT). MLAT constructs a semantic consistency enhanced module (SCE) which includes the semantic mask attention module and a fine-grained feature generator based on transformer. It learns fine-grained semantic information to preserve the high-level semantic relevance and eliminate modal specific features. In order to narrow the distance between common representations and further eliminate modal specific features, we construct a multi-stage adversarial learning module to optimize feature representations. Furthermore, we design a label graph network based on graph attention network (GAT) to better explore the semantic correlations between labels and learn a classifier. Three benchmark datasets are synthesized to demonstrate the superiority of MLAT method.

Multi-label adversarial fine-grained cross-modal retrieval

期刊

SIGNAL PROCESSING-IMAGE COMMUNICATION

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Multi-label adversarial fine-grained cross-modal retrieval

期刊

SIGNAL PROCESSING-IMAGE COMMUNICATION

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文