☆ 3.8 Proceedings Paper

StacMR: Scene-Text Aware Cross-Modal Retrieval

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021 (2021)

期刊

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021

卷 -, 期 -, 页码 2219-2229

出版社

IEEE

DOI: 10.1109/WACV48630.2021.00227

关键词

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic Imaging Science & Photographic Technology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper introduces a new dataset for cross-modal retrieval involving scene-text instances, proposes approaches leveraging scene text, and conducts experiments to confirm the benefits of utilizing scene text.

Recent models for cross-modal retrieval have benefited from an increasingly rich understanding of visual scenes, afforded by scene graphs and object interactions to mention a few. This has resulted in an improved matching between the visual representation of an image and the textual representation of its caption. Yet, current visual representations overlook a key aspect: the text appearing in images, which may contain crucial information for retrieval. In this paper, we first propose a new dataset that allows exploration of cross-modal retrieval where images contain scene-text instances. Then, armed with this dataset, we describe several approaches which leverage scene text, including a better scene-text aware cross-modal retrieval method which uses specialized representations for text from the captions and text from the visual scene, and reconcile them in a common embedding space. Extensive experiments confirm that cross-modal retrieval approaches benefit from scene text and highlight interesting research questions worth exploring further. Dataset and code are available at europe.naverlabs.com/stacmr.

StacMR: Scene-Text Aware Cross-Modal Retrieval

期刊

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021

出版社

IEEE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

StacMR: Scene-Text Aware Cross-Modal Retrieval

期刊

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021

出版社

IEEE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文