Journal
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022
Volume -, Issue -, Pages 4610-4614Publisher
ASSOC COMPUTING MACHINERY
DOI: 10.1145/3511808.3557710
Keywords
cross-modal search; retrieval; computer vision
Categories
Ask authors/readers for more resources
Texture BERT model describes visual attributes of texture using natural language, capturing rich details in texture images with compact bilinear pooling and enhancing matching effectiveness with self-attention transformer layers.
We propose Texture BERT, a model describing visual attributes of texture using natural language. To capture the rich details in texture images, we propose a group-wise compact bilinear pooling method, which represents the texture image by a set of visual patterns. The similarity between the texture image and the corresponding language description is determined by the cross-matching between the set of visual patterns from the texture image and the set of word features from the language description. We also exploit the self-attention transformer layers to provide the cross-modal context and enhance the effectiveness of matching. Our efforts achieve state-of-the-art accuracy on both text retrieval and image retrieval tasks, demonstrating the effectiveness of the proposed Texture BERT model in describing texture through natural language.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available