☆ 4.7 Article

Global-local fusion based on adversarial sample generation for image-text matching

INFORMATION FUSION (2024)

期刊

INFORMATION FUSION

卷 103, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.inffus.2023.102084

关键词

Image-text matching; Global-local cognition; Adversarial sample generation; Dynamic fusion; Loss adjustment

类别

Computer Science, Artificial Intelligence Computer Science, Theory & Methods

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In the era of adversarial machine learning (AML), developing robust and generalized algorithms has become a key research topic. This study proposes a global similarity matching module and a global-local cognition fusion training mechanism based on relationship adversarial sample generation to improve image-text matching algorithm. Experimental results show significant improvements in accuracy and robustness, performing well in facing security challenges and promoting the fusion of visual and linguistic modalities.

In the increasingly popular era of adversarial machine learning (AML), developing more robust and generalized algorithms has become a key research topic. Image-text matching as the foundation of tasks such as video Q&A and text-image generation also faces various attacks in AML. Current image-text matching based on the similarity of matching fragments only focuses on the local matching results, which does not establish a comprehensive cognition of content in text and image, resulting in mismatching of the abstract scene when facing complex attacks. Meanwhile, existing methods are not sensitive enough to identify the internal relationship between objects in different local areas, which also confuse matching. Therefore, aiming at the above problems, a global similarity matching module is proposed, which is dynamically fused with local similarity to measure the matching results flexibly and improve the understanding of abstract scenes. Furthermore, a global-local cognition fusion training mechanism based on relationship adversarial sample generation is proposedto enhance understanding of internal relationships between objects in different local area through adversarial sample generation. Global loss is introduced to train the overall model, and adjust the proportion of global-local loss in the training process to better identified the relationships between objects in different local areas, and avoided confusion and matching caused by the similarity of matching objects. Experimental results show that the proposed method is 7.4 % (rSum) better than the SOTA method on the Flickr30K dataset, and 4.0 % (rSum using the 1K test set) better on the MS-COCO dataset. The proposed global-local fusion (GLF) based on adversarial sample generation for image-text matching algorithm improves the accuracy and robustness of image-text matching performs well in facing some security challenges, promoting the development of visual and linguistic modal fusion.

Global-local fusion based on adversarial sample generation for image-text matching

期刊

INFORMATION FUSION

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Global-local fusion based on adversarial sample generation for image-text matching

期刊

INFORMATION FUSION

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文