☆ 4.7 Article

Learning to Predict 3D Surfaces of Sculptures from Single and Multiple Views

INTERNATIONAL JOURNAL OF COMPUTER VISION (2019)

期刊

INTERNATIONAL JOURNAL OF COMPUTER VISION

卷 127, 期 11-12, 页码 1780-1800

出版社

SPRINGER

DOI: 10.1007/s11263-018-1124-0

关键词

Visual hull; Generative model; Silhouette prediction; Depth prediction; Convolutional neural networks; Sculpture dataset

类别

Computer Science, Artificial Intelligence

资金

EPSRC studentship
EPSRC Programme Grant [Seebibyte EP/M013774/1]
EPSRC [1798398, EP/M013774/1] Funding Source: UKRI

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The objective of this work is to reconstruct the 3D surfaces of sculptures from one or more images using a view-dependent representation. To this end, we train a network, SiDeNet, to predict the Silhouette and Depth of the surface given a variable number of images; the silhouette is predicted at a different viewpoint from the inputs (e.g. from the side), while the depth is predicted at the viewpoint of the input images. This has three benefits. First, the network learns a representation of shape beyond that of a single viewpoint, as the silhouette forces it to respect the visual hull, and the depth image forces it to predict concavities (which don't appear on the visual hull). Second, as the network learns about 3D using the proxy tasks of predicting depth and silhouette images, it is not limited by the resolution of the 3D representation. Finally, using a view-dependent representation (e.g. additionally encoding the viewpoint with the input image) improves the network's generalisability to unseen objects. Additionally, the network is able to handle the input views in a flexible manner. First, it can ingest a different number of views during training and testing, and it is shown that the reconstruction performance improves as additional views are added at test-time. Second, the additional views do not need to be photometrically consistent. The network is trained and evaluated on two synthetic datasets-a realistic sculpture dataset (SketchFab), and ShapeNet. The design of the network is validated by comparing to state of the art methods for a set of tasks. It is shown that (i) passing the input viewpoint (i.e. using a view-dependent representation) improves the network's generalisability at test time. (ii) Predicting depth/silhouette images allows for higher quality predictions in 2D, as the network is not limited by the chosen latent 3D representation. (iii) On both datasets the method of combining views in a global manner performs better than a local method. Finally, we show that the trained network generalizes to real images, and probe how the network has encoded the latent 3D shape.

Learning to Predict 3D Surfaces of Sculptures from Single and Multiple Views

期刊

INTERNATIONAL JOURNAL OF COMPUTER VISION

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Learning to Predict 3D Surfaces of Sculptures from Single and Multiple Views

期刊

INTERNATIONAL JOURNAL OF COMPUTER VISION

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文