4.8 Article

Towards Accurate Reconstruction of 3D Scene Shape From A Single Monocular Image

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2022.3209968

关键词

Three-dimensional displays; Shape; Point cloud compression; Estimation; Training; Image reconstruction; Solid modeling; Monocular depth prediction; 3D reconstruction; 3D scene shape estimation

向作者/读者索取更多资源

Despite progress, challenges in depth estimation from single monocular images remain due to limited training data. To overcome this, a two-stage framework is proposed to predict depth and estimate scene shapes using large-scale relative depth data. Separate training of modules allows for flexibility, and image-level regression loss and normal-based geometry loss improve training with relative depth annotation. State-of-the-art performance is achieved on unseen datasets. Code available at: https://github.com/aim-uofa/depth/.
Despite significant progress made in the past few years, challenges remain for depth estimation using a single monocular image. First, it is nontrivial to train a metric-depth prediction model that can generalize well to diverse scenes mainly due to limited training data. Thus, researchers have built large-scale relative depth datasets that are much easier to collect. However, existing relative depth estimation models often fail to recover accurate 3D scene shapes due to the unknown depth shift caused by training with the relative depth data. We tackle this problem here and attempt to estimate accurate scene shapes by training on large-scale relative depth data, and estimating the depth shift. To do so, we propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes. As the two modules are trained separately, we do not need strictly paired training data. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to improve training with relative depth annotation. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation. Code is available at: https://github.com/aim-uofa/depth/.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据