3.8 Proceedings Paper

Multi-scale Residual Pyramid Attention Network for Monocular Depth Estimation

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/ICPR48806.2021.9412670

Keywords

monocular depth estimation; convolutional neural networks; multi-scale; attention module; context

Funding

  1. National Natural Science Foundation of China [61532002, 61702482, 61802109]
  2. Major Program of National Natural Science Foundation of China [91938301]
  3. National Defense Equipment Advance Research Shared Technology Program of China [41402050301-170441402065]
  4. Sichuan Science and Technology Major Project on New Generation Artificial Intelligence [2018GZDZX0034]
  5. Natural Science Foundation of Hebei Province [F2020205006]
  6. Top Youth Talents of Science and Technology Research Project in Hebei Province [BJ2020059]
  7. Science Foundation of Hebei Normal University [L2018K02]
  8. Open Foundation of Beijing Key Laboratory of Mobile Computing and Pervasive Device

Ask authors/readers for more resources

In this paper, an end-to-end multi-scale residual pyramid attention network (MRPAN) is proposed to solve the challenging problem of monocular depth estimation in computer vision. By introducing a multi-scale attention context aggregation module and an improved residual refinement module, the network is able to better recover complex structures and retain more local details in scenes.
Monocular depth estimation is a challenging problem in computer vision and is crucial for understanding 3D scene geometry. Recently, deep convolutional neural networks (DCNNs) based methods have improved the estimation accuracy significantly. However, existing methods fail to consider complex textures and geometries in scenes, thereby resulting in loss of local details, distorted object boundaries, and blurry reconstruction. In this paper, we proposed an end-to-end multi-scale residual pyramid attention network (MRPAN) to mitigate these problems. First, we propose a multi-scale attention context aggregation (MACA) module, which consists of spatial attention module (SAM) and global attention module (GAM). By considering the position and scale correlation of pixels from spatial and global perspectives, the proposed module can adaptively learn the similarity between pixels so as to obtain more global context information of the image and recover complex structures in the scene. Then we proposed an improved residual refinement module (RRM) to further refine the scene structure, giving rise to deeper semantic information and retain more local details. Experimental results show that our method achieves more promising performance in object boundaries and local details compared with other state-of-the-art methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available