4.7 Article

Multi-Scale Spatial Attention-Guided Monocular Depth Estimation With Semantic Enhancement

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING
Volume 30, Issue -, Pages 8811-8822

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIP.2021.3120670

Keywords

Estimation; Semantics; Mutual information; Feature extraction; Correlation; Cameras; Visualization; Depth estimation; multi-scale spatial attention-guided; mutual information; semantic enhancement

Funding

  1. Natural Science Foundations of China [61771091, 61871066]
  2. National High Technology Research and Development Program (863 Program) of China [2015AA016306]
  3. Natural Science Foundation of Liaoning Province of China [20170540159]
  4. Fundamental Research Fund for the Central Universities of China [DUT17LAB04]

Ask authors/readers for more resources

This study presents a monocular depth estimation method with multi-scale spatial attention guidance and semantic enhancement, which can focus more on small objects and improve the sharpness of depth prediction edges. Experimental results on public benchmark datasets demonstrate the effectiveness and superior performance of the proposed method.
Depth estimation from single monocular image is a vital but challenging task in 3D vision and scene understanding. Previous unsupervised methods have yielded impressive results, but the predicted depth maps still have several disadvantages such as missing small objects and object edge blurring. To address these problems, a multi-scale spatial attention guided monocular depth estimation method with semantic enhancement is proposed. Specifically, we first construct a multi-scale spatial attention-guided block based on atrous spatial pyramid pooling and spatial attention. Then, the correlation between the left and right views is fully explored by mutual information to obtain a more robust feature representation. Finally, we design a double-path prediction network to simultaneously generate depth maps and semantic labels. The proposed multi-scale spatial attention-guided block can focus more on the objects, especially on small objects. Moreover, the additional semantic information also enables the objects edge in the predicted depth maps more sharper. We conduct comprehensive evaluations on public benchmark datasets, such as KITTI and Make3D. The experiment results well demonstrate the effectiveness of the proposed method and achieve better performance than other self-supervised methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available