4.7 Article

MODE: Monocular omnidirectional depth estimation via consistent depth fusion

Journal

IMAGE AND VISION COMPUTING
Volume 136, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.imavis.2023.104723

Keywords

Omnidirectional depth estimation; Depth initialization; Long-range dependency

Ask authors/readers for more resources

Monocular depth estimation has made significant progress in recent years, but the results in omnidirectional images are not satisfying. In this paper, we propose a novel network-MODE that addresses the challenges in omnidirectional depth estimation and improves performance through flexible modules. The proposed method is validated on widely used datasets and shown to be effective through an ablation study on real-world datasets.
Monocular depth estimation has seen significant progress in recent years, especially in outdoor scenes. However, depth estimation results are not satisfying in omnidirectional images. As compared to perspective images, esti-mating the depth map from an omnidirectional image captured in the outdoor scene, using neural networks, has two additional challenges: (i) the depth range of outdoor images varies a lot across different scenes, making it difficult for the depth network to predict accurate depth results for training with an indoor dataset, besides the maximum distance in outdoor scenes mostly stays the same as the camera sees the sky, but depth labels in this region are entirely missing in existing datasets; (ii) a standard representation of omnidirectional images intro-duces spherical distortion, which causes difficulties for the vanilla network to predict accurate relative structural depth details. In this paper, we propose a novel network-MODE by giving special considerations to those challenges and designing a set of flexible modules for improving the performance of omnidirectional depth esti-mation. First, a consistent depth structure module is proposed to estimate a consistent depth structure map, and the predicted structural map can improve depth details. Second, to suit the characteristics of spherical sampling, we propose a strip convolution fusion module to enhance long-range dependencies. Third, rather than using a single depth decoder branch as in previous methods, we propose a semantics decoder branch to estimate sky re-gions in the omnidirectional image. The proposed method is validated on three widely used datasets, demon-strating the state-of-the-art performance. Moreover, the effectiveness of each module is shown through an ablation study on real-world datasets. Our code is available at https://github.com/lkku1/MODE.& COPY; 2017 Elsevier Inc. All rights reserved. & COPY; 2023 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available