3.8 Proceedings Paper

Cross-Probe BERT for Fast Cross-Modal Search

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3477495.3531826

关键词

Cross-modal Retrieval; Multimedia Search; Cross-modal BERT

向作者/读者索取更多资源

This study proposes a novel architecture, cross-probe BERT, for text-image retrieval. By using a small number of text and vision probes and their interactions, it efficiently achieves cross-modal attention with lightweight computation cost, demonstrating excellent effectiveness and efficiency in systematic experiments on public benchmarks.
Owing to the effectiveness of cross-modal attentions, text-vision BERT models have achieved excellent performance in text-image retrieval. Nevertheless, cross-modal attentions in text-vision BERT models require expensive computation cost when tackling text-vision retrieval due to their pairwise input. Therefore, normally, it is impractical for deploying them for large-scale cross-modal retrieval in real applications. To address the inefficiency issue in exiting text-vision BERT models, in this work, we develop a novel architecture, cross-probe BERT. It devises a small number of text and vision probes, and the cross-modal attentions are efficiency achieved through the interactions between text and vision probes. It takes lightweight computation cost, and meanwhile effectively exploits cross-modal attention. Systematic experiments on public benchmarks demonstrate excellent effectiveness and efficiency of our cross-probe BERT.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据