☆ 3.8 Proceedings Paper

FashionVLP: Vision Language Transformer for Fashion Retrieval with Feedback

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2022)

期刊

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)

卷 -, 期 -, 页码 14085-14095

出版社

IEEE COMPUTER SOC

DOI: 10.1109/CVPR52688.2022.01371

关键词

类别

Computer Science, Artificial Intelligence Imaging Science & Photographic Technology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study proposes a new vision-language transformer based model, FashionVLP, for fashion image retrieval. The model utilizes prior knowledge from large image-text corpora and combines visual information from multiple levels of context to effectively capture fashion-related information. The results show that FashionVLP achieves state-of-the-art performance on benchmark datasets and a significant improvement on the challenging FashionlQ dataset with complex natural language feedback.

Fashion image retrieval based on a query pair of reference image and natural language feedback is a challenging task that requires models to assess fashion related information from visual and textual modalities simultaneously. We propose a new vision-language transformer based model, FashionVLP, that brings the prior knowledge contained in large image-text corpora to the domain of fashion image retrieval, and combines visual information from multiple levels of context to effectively capture fashion-related information.While queries are encoded through the transformer layers, our asymmetric design adopts a novel attention-based approach for fusing target image features without involving text or transformer layers in the process. Extensive results show that FashionVLP achieves the state-of-the-art performance on benchmark datasets, with a large 23% relative improvement on the challenging FashionlQ dataset, which contains complex natural language feedback.

FashionVLP: Vision Language Transformer for Fashion Retrieval with Feedback

期刊

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

FashionVLP: Vision Language Transformer for Fashion Retrieval with Feedback

期刊

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文