4.6 Article

Self-supervised monocular Depth estimation with multi-scale structure similarity loss

Journal

MULTIMEDIA TOOLS AND APPLICATIONS
Volume -, Issue -, Pages -

Publisher

SPRINGER
DOI: 10.1007/s11042-022-14012-6

Keywords

Self-supervised learning; Monocular depth estimation; Structural similarity; Attentional mechanism

Funding

  1. National Natural Science Foundation of China [51774281]

Ask authors/readers for more resources

This paper proposes a self-supervised monocular depth estimation algorithm based on multi-scale structure similarity loss to address the issue of missing depth values in raw depth images. By introducing an attention mechanism to the deep prediction network, the accuracy and perception ability of depth information are further enhanced. Experimental results demonstrate significant improvements in accuracy and visualization effects with the proposed algorithm.
The raw depth image captured by the depth sensor usually has an extensive range of missing depth values, and the incomplete depth map burdens many downstream vision tasks. In order to overcome the incorrect estimation issue of depth information with the original luminosity loss function for processing complex texture areas and distant moving objects, this paper proposes a self-supervised monocular depth estimation algorithm based on multi-scale structure similarity loss. So as to enhance the perception ability of the depth prediction network for pixel edges, this paper proposes a multi-scale structural similarity when calculating the loss. In addition, an attention mechanism is also added to the encoder stage of the deep prediction network. As a result, the network not only ignores the features with small contributions, but also strengthens the features assist judgment based on the adjustment of the feature map. Finally, the experiments on the KITTI dataset and Cityscapes are conducted, and then the results are compared and analyzed with the state-of-the-art algorithms. The experimental results demonstrate that the proposed algorithm achieves significant improvements in accuracy, especially on the KITTI dataset, whose precision is raised to 88.4%. Moreover, under the premise of outstanding accuracy, the visualization effect of depth estimation has also been significantly improved, especially in the scenes with multi-person overlap on Cityscapes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available