4.6 Article

Boosting Cross-Modal Retrieval With MVSE plus plus and Reciprocal Neighbors

期刊

IEEE ACCESS
卷 8, 期 -, 页码 84642-84651

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2020.2992187

关键词

Cross-modal retrieval; visual-semantic embedding; scene context; reciprocal neighbors; re-ranking method

资金

  1. National Natural Science Foundation of China [61671385, 61571354]
  2. Science, Technology, and Innovation Commission of Shenzhen Municipality [JCYJ20190806160210899]

向作者/读者索取更多资源

In this paper, we propose to boost the cross-modal retrieval through mutually aligning images and captions on the aspects of both features and relationships. First, we propose a multi-feature based visualsemantic embedding (MVSE++) space to retrieve the candidates in another modality, which provides a more comprehensive representation of the visual content of objects and scene context in images. Thus, we have more potential to find a more accurate and detailed caption for the image. However, captioning concentrates the image contents by semantic description. The cross-modal neighboring relationships start from the visual and semantic sides are asymmetric. To retrieve a better cross-modal neighbor, we propose to re-rank the initially retrieved candidates according to the k nearest reciprocal neighbors in MVSE++ space. The method is evaluated on the benchmark datasets of MSCOCO and Flickr30K with standard metrics. We achieve highe accuracy in caption retrieval and image retrieval at both R@1 and R@10.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据