4.7 Article

Unsupervised Monocular Depth Estimation With Channel and Spatial Attention

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2022.3221416

关键词

Estimation; Videos; Feature extraction; Training; Cameras; Convolution; Task analysis; Channel attention; monocular depth estimation; spatial attention; unsupervised learning

资金

  1. National Key R&D Program of China [2018YFB1305003]
  2. National Natural Science Foundation of China [61922063, 62273255]
  3. Shanghai International Scienceand Technology Cooperation Project [21550760900, 22510712000]
  4. Shanghai Municipal Science and Technology Major Project [2021SHZDZX0100]
  5. Fundamental Research Funds for the Central Universities

向作者/读者索取更多资源

In this article, an unsupervised monocular depth and camera motion estimation framework is proposed using unlabeled monocular videos. The method utilizes photometric loss, channelwise attention mechanism, and spatialwise attention mechanism to achieve state-of-the-art results on the KITTI benchmark and great generalization performance on the Make3D dataset.
Understanding 3-D scene geometry from videos is a fundamental topic in visual perception. In this article, we propose an unsupervised monocular depth and camera motion estimation framework using unlabeled monocular videos to overcome the limitation of acquiring per-pixel ground-truth depth at scale. The photometric loss couples the depth network and pose network together and is essential to the unsupervised method, which is based on warping nearby views to target using the estimated depth and pose. We introduce the channelwise attention mechanism to dig into the relationship between channels and introduce the spatialwise attention mechanism to utilize the inner-spatial relationship of features. Both of them applied in depth networks can better activate the feature information between different convolutional layers and extract more discriminative features. In addition, we apply the Sobel boundary to our edge-aware smoothness for more reasonable accuracy, and clearer boundaries and structures. All of these help to close the gap with fully supervised methods and show high-quality state-of-the-art results on the KITTI benchmark and great generalization performance on the Make3D dataset.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据