4.7 Article

Multi-level knowledge-driven feature representation and triplet loss optimization network for image-text retrieval

期刊

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.ipm.2023.103575

关键词

Image-text retrieval; Cross-modal retrieval; Prior knowledge; Triplet loss optimization

向作者/读者索取更多资源

Image-text retrieval is important in connecting vision and language. This paper proposes a method that utilizes prior knowledge to enhance feature representations and optimize network training for better retrieval results.
Image-text retrieval plays a considerable role in associating vision and language. Existing mainstream approaches focus on fine-grained alignment while ignoring the influence of prior knowledge on model performance and the limitation of using a fixed margin in the triplet loss. In this paper, we propose a Multi-level Knowledge-driven feature representation and Triplet Loss Optimization Network (MKTLON) that exploits prior knowledge to enhance visual and textual feature representations and utilizes an adaptive margin of the triplet loss to optimize network training. Specifically, we first present an Enhanced feature Representation scheme based on the Self-Attention (ERSA) module, which incorporates the prior knowledge randomly initialized by uniform distribution into the matrices K and V in the self-attention mechanism. Subsequently, we adopt cascaded ERSA modules to encode images and texts to obtain multilevel visual and textual features with prior knowledge. Furthermore, we develop an adaptive margin optimization strategy that models the relevance scores of positive and negative samples as two independent Gaussian distributions, and obtain the optimized margin by minimizing the intersection of these two distributions. Extensive experiments on two benchmarks, Flickr30K (155,000 image-text pairs) and MSCOCO (616,435 image-text pairs), show the proposed MKTLON achieves 5.7% and 4.3% improvements on rSum, respectively, compared to the state-of-the-art method. The source code will be released at https://github.com/FlyCuteBird/ MKTLON.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据