4.6 Article

Self-Supervised Monocular Depth Estimation With Geometric Prior and Pixel-Level Sensitivity

Journal

IEEE TRANSACTIONS ON INTELLIGENT VEHICLES
Volume 8, Issue 3, Pages 2244-2256

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIV.2022.3210274

Keywords

Estimation; Costs; Training; Sensitivity; Cameras; Optical flow; Semantics; Monocular depth estimation; self-supervised learning; prior feature consistency; sensitivity adaptation

Ask authors/readers for more resources

This paper proposes a self-supervised monocular depth estimation framework with geometric prior and pixel-level sensitivity. A geometric pose estimator is used to introduce geometric constraints, and an alternative learning strategy is designed to improve the learning of the prior depth predictor. Prior feature consistency regularization is introduced to enhance the learning of the depth encoder. A sensitivity-adaptive depth decoder is built to handle the pixel-level difference of sensitivity in the input frame.
Self-supervised monocular depth estimation has gained popularity due to its convenience of training network without dense ground truth depth annotation. Specifically, the multi-frame monocular depth estimation achieves promising results in virtue of temporal information. However, existing multi-frame solutions ignore the different impacts of pixels of input frame on depth estimation and the geometric information is still insufficiently explored. In this paper, a self-supervised monocular depth estimation framework with geometric prior and pixel-level sensitivity is proposed. Geometric constraint is involved through a geometric pose estimator with prior depth predictor and optical flow predictor. Further, an alternative learning strategy is designed to improve the learning of prior depth predictor by decoupling it with the ego-motion from the geometric pose estimator. On this basis, prior feature consistency regularization is introduced into the depth encoder. By taking the dense prior cost volume based on optical flow map and ego-motion as the supervising signal for feature consistency learning, the cost volume is obtained with more reasonable feature matching. To deal with the pixel-level difference of sensitivity in input frame, a sensitivity-adaptive depth decoder is built by flexibly adding a shorter path from cost volume to the final depth prediction. In this way, the back propagation of gradient to cost volume is adaptively adjusted, and an accurate depth map is decoded. The effectiveness of the proposed method is verified on public datasets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available