4.7 Article

A Hierarchical Deformable Deep Neural Network and an Aerial Image Benchmark Dataset for Surface Multiview Stereo Reconstruction

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TGRS.2023.3234694

Keywords

Deep learning; depth map-based stereo recon-struction; digital surface model (DSM); multiview stereo (MVS) reconstruction

Ask authors/readers for more resources

This article introduces a new benchmark dataset called the LuoJia-MVS dataset and a new deep neural network called the HDC-MVSNet. The LuoJia-MVS dataset contains 7972 five-view images with a spatial resolution of 10 cm, pixel-wise depths, and precise camera parameters. The HDC-MVSNet network is designed with a new full-scale feature pyramid extraction module, a hierarchical set of 3-D convolutional blocks, and true 3-D deformable convolutional layers by considering the deformation problem and scale variation issue of aerial images.
Multiview stereo (MVS) aerial image depth estimation is a research frontier in the remote sensing field. Recent deep learning-based advances in close-range object reconstruction have suggested the great potential of this approach. Meanwhile, the deformation problem and the scale variation issue are also worthy of attention. These characteristics of aerial images limit the applicability of the current methods for aerial image depth estimation. Moreover, there are few available benchmark datasets for aerial image depth estimation. In this regard, this article describes a new benchmark dataset called the LuoJia-MVS dataset (https://irsip.whu.edu.cn/resources/resources_en_v2.php), as well as a new deep neural network known as the hierarchical deformable cascade MVS network (HDC-MVSNet). The LuoJia-MVS dataset contains 7972 five-view images with a spatial resolution of 10 cm, pixel-wise depths, and precise camera parameters, and was generated from an accurate digital surface model (DSM) built from thousands of stereo aerial images. In the HDC-MVSNet network, a new full-scale feature pyramid extraction module, a hierarchical set of 3-D convolutional blocks, and true 3-D deformable 3-D convolutional layers are specifically designed by considering the aforementioned characteristics of aerial images. Overall and ablation experiments on the WHU and LuoJia-MVS datasets validated the superiority of HDC-MVSNet over the current state-of-the-art MVS depth estimation methods and confirmed that the newly built dataset can provide an effective benchmark.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available