4.6 Article

GCNDepth: Self-supervised monocular depth estimation based on graph convolutional network

Journal

NEUROCOMPUTING
Volume 517, Issue -, Pages 81-92

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2022.10.073

Keywords

Deep learning; Graph convolutional network; Monocular depth estimation; Self -supervision

Ask authors/readers for more resources

This study proposes a new self-supervised monocular depth estimation model that utilizes GCN to handle irregular image regions and enhances the quantitative and qualitative understanding of depth maps. The method achieves a high prediction accuracy of 89% on the KITTI dataset and reduces the number of trainable parameters by 40% compared to existing solutions.
Depth estimation is a challenging task of 3D reconstruction to enhance the accuracy sensing of environ-ment awareness. This work brings a new solution with improvements, which increases the quantitative and qualitative understanding of depth maps compared to existing methods. Recently, convolutional neural networks (CNN) have demonstrated their extraordinary ability to estimate depth maps from monocular videos. However, traditional CNN does not support a topological structure, and they can work only on regular image regions with determined sizes and weights. On the other hand, graph convolu-tional networks (GCN) can handle the convolution of non-Euclidean data, and they can be applied to irregular image regions within a topological structure. Therefore, to preserve object geometric appear-ances and objects locations in the scene, in this work, we aim to exploit GCN for a self-supervised monoc-ular depth estimation model. Our model consists of two parallel auto-encoder networks: the first is an auto-encoder that will depend on ResNet-50 and extract the feature from the input image and on multi-scale GCN to estimate the depth map. In turn, the second network will be used to estimate the ego-motion vector (i.e., 3D pose) between two consecutive frames based on ResNet-18. The estimated 3D pose and depth map will be used to construct the target image. A combination of loss functions related to photometric, reprojection, and smoothness is used to cope with bad depth prediction and pre-serve the discontinuities of the objects. Our method and performance are improved quantitatively and qualitatively. In particular, our method provided comparable and promising results with a high predic-tion accuracy of 89% on the publicly available KITTI dataset. Our method also offers 40% reduction in the number of trainable parameters compared to the state of the art solutions.In addition, we tested our trained model with Make3D dataset to evaluate the trained model on a new dataset with low reso-lution images. The source code is publicly available at (https://github.com/ArminMasoumian/GCNDepth. git)(c) 2022 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available