4.7 Article

Unsupervised Monocular Depth Perception: Focusing on Moving Objects

期刊

IEEE SENSORS JOURNAL
卷 21, 期 24, 页码 27225-27237

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JSEN.2021.3109266

关键词

Cameras; Convolutional neural networks; Vehicle dynamics; Training; Videos; Unsupervised learning; Estimation; Visual Sensing; depth perception; unsupervised learning; monocular video; dynamic objects

资金

  1. Shenzhen Natural Science Foundation [JCYJ20190813170601651]
  2. Shenzhen Institute of Artificial Intelligence and Robotics for Society [AC01202101006]

向作者/读者索取更多资源

This study explores the unsupervised learning of depth from monocular videos using photometric errors and introduces outlier masking technique and efficient weighted multi-scale scheme to improve the accuracy of depth learning. Extensive experiments demonstrate the effectiveness of the proposed approach on depth or ego-motion estimation, as well as separate evaluation on dynamic objects and static background regions for both supervised and unsupervised methods.
As a flexible passive 3D sensing means, unsupervised learning of depth from monocular videos is becoming an important research topic. It utilizes the photometric errors between the target view and the synthesized views from its adjacent source views as the loss instead of the difference from the ground truth. Occlusion and scene dynamics in real-world scenes still adversely affect the learning, despite significant progress made recently. In this paper, we show that deliberately manipulating photometric errors can efficiently deal with these difficulties better. We first propose an outlier masking technique that considers the occluded or dynamic pixels as statistical outliers in the photometric error map. With the outlier masking, the network learns the depth of objects that move in the opposite direction to the camera more accurately. To the best of our knowledge, such cases have not been seriously considered in the previous works, even though they pose a high risk in applications like autonomous driving. We also propose an efficient weighted multi-scale scheme to reduce the artifacts in the predicted depth maps. Extensive experiments on the KITTI dataset and additional experiments on the Cityscapes dataset have verified the proposed approach's effectiveness on depth or ego-motion estimation. Furthermore, for the first time, we evaluate the predicted depth on the regions of dynamic objects and static background separately for both supervised and unsupervised methods. The evaluation further verifies the effectiveness of our proposed technical approach and provides some interesting observations that might inspire future research in this direction.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据