3.8 Proceedings Paper

Texture BERT for Cross-modal Texture Image Retrieval

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3511808.3557710

Keywords

cross-modal search; retrieval; computer vision

Ask authors/readers for more resources

Texture BERT model describes visual attributes of texture using natural language, capturing rich details in texture images with compact bilinear pooling and enhancing matching effectiveness with self-attention transformer layers.
We propose Texture BERT, a model describing visual attributes of texture using natural language. To capture the rich details in texture images, we propose a group-wise compact bilinear pooling method, which represents the texture image by a set of visual patterns. The similarity between the texture image and the corresponding language description is determined by the cross-matching between the set of visual patterns from the texture image and the set of word features from the language description. We also exploit the self-attention transformer layers to provide the cross-modal context and enhance the effectiveness of matching. Our efforts achieve state-of-the-art accuracy on both text retrieval and image retrieval tasks, demonstrating the effectiveness of the proposed Texture BERT model in describing texture through natural language.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available