3.8 Proceedings Paper

Texture BERT for Cross-modal Texture Image Retrieval

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3511808.3557710

关键词

cross-modal search; retrieval; computer vision

向作者/读者索取更多资源

Texture BERT model describes visual attributes of texture using natural language, capturing rich details in texture images with compact bilinear pooling and enhancing matching effectiveness with self-attention transformer layers.
We propose Texture BERT, a model describing visual attributes of texture using natural language. To capture the rich details in texture images, we propose a group-wise compact bilinear pooling method, which represents the texture image by a set of visual patterns. The similarity between the texture image and the corresponding language description is determined by the cross-matching between the set of visual patterns from the texture image and the set of word features from the language description. We also exploit the self-attention transformer layers to provide the cross-modal context and enhance the effectiveness of matching. Our efforts achieve state-of-the-art accuracy on both text retrieval and image retrieval tasks, demonstrating the effectiveness of the proposed Texture BERT model in describing texture through natural language.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据