☆ 3.8 Proceedings Paper

Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) (2021)

期刊

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021)

卷 -, 期 -, 页码 11385-11395

出版社

IEEE

DOI: 10.1109/ICCV48922.2021.01121

关键词

类别

Computer Science, Artificial Intelligence Computer Science, Theory & Methods

资金

National Natural Science Foundation of China [61872440, 62061136007]
Royal Society Newton Advanced Fellowship [NAF\R2\192151]
Youth Innovation Promotion Association CAS
Alibaba Innovative Research (AIR) Program
Open Research Projects of Zhejiang Lab [2021KE0AB06]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study introduces a novel IBSR pipeline based on self-supervised contrastive learning, utilizing multi-view grayscale rendered images and color transfer for data augmentation to address the challenges of contrastive learning between 2D images and 3D shapes. The proposed approach achieves the best performance on popular IBSR benchmarks by distinguishing similar objects from different categories and outperforming the previous state-of-the-art by 4% - 15% in retrieval accuracy.

In this work, we tackle the problem of single imagebased 3D shape retrieval (IBSR), where we seek to find the most matched shape of a given single 2D image from a shape repository. Most of the existing works learn to embed 2D images and 3D shapes into a common feature space and perform metric learning using a triplet loss. Inspired by the great success in recent contrastive learning works on self-supervised representation learning, we propose a novel IBSR pipeline leveraging contrastive learning. We note that adopting such cross-modal contrastive learning between 2D images and 3D shapes into IBSR tasks is non-trivial and challenging: contrastive learning requires very strong data augmentation in constructed positive pairs to learn the feature invariance, whereas traditional metric learning works do not have this requirement. Moreover, object shape and appearance are entangled in 2D query images, thus making the learning task more difficult than contrasting single-modal data. To mitigate the challenges, we propose to use multi-view grayscale rendered images from the 3D shapes as a shape representation. We then introduce a strong data augmentation technique based on color transfer, which can significantly but naturally change the appearance of the query image, effectively satisfying the need for contrastive learning. Finally, we propose to incorporate a novel category-level contrastive loss that helps distinguish similar objects from different categories, in addition to classic instance-level contrastive loss. Our experiments demonstrate that our approach achieves the best performance on all the three popular IBSR benchmarks, including Pix3D, Stanford Cars, and Comp Cars, outperforming the previous state-of-the-art from 4% - 15% on retrieval accuracy.

Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning

期刊

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning

期刊

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文