4.7 Article

A general deep learning based framework for 3D reconstruction from multi-view stereo satellite images

Journal

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING
Volume 195, Issue -, Pages 446-461

Publisher

ELSEVIER
DOI: 10.1016/j.isprsjprs.2022.12.012

Keywords

Multi -view stereo; Optical satellite images; Deep learning; Dense matching; 3D reconstruction

Ask authors/readers for more resources

In this paper, a general deep learning framework named Sat-MVSF is proposed for three-dimensional (3D) reconstruction of the Earth's surface from multi-view optical satellite images. The framework includes pre-processing, a multi-view stereo network specifically designed for satellite imagery (Sat-MVSNet), and post-processing. The framework achieves state-of-the-art performance and robustness by incorporating deep feature extraction, rational polynomial camera warping, pyramid cost volume construction, regularization, regression, and a self-refinement strategy. Comparative experiments demonstrate the potential and superiority of the proposed framework over commercial and open-source methods. The author also emphasizes the need for more high-quality open-source training data to facilitate research in this field.
In this paper, we propose a general deep learning based framework, named Sat-MVSF, to perform threedimensional (3D) reconstruction of the Earth's surface from multi-view optical satellite images. The framework is a complete processing pipeline, including pre-processing, a multi-view stereo (MVS) network for satellite imagery (Sat-MVSNet), and post-processing. The pre-processing handles the geometric and radiometric configuration of the multi-view images and their cropping. The cropped multi-view patches are then fed into SatMVSNet, which includes deep feature extraction, rational polynomial camera (RPC) warping, pyramid cost volume construction, regularization, and regression, to obtain the height maps. The error matches are then filtered out and a digital surface model (DSM) is generated in the post-processing. Considering the complexity and diversity of real-world scenes, we also introduce a self-refinement strategy that does not require any groundtruth labels to enhance the performance and robustness of the Sat-MVSF framework. We comprehensively compare the proposed framework with popular commercial software and open-source methods, to demonstrate the potential of the proposed deep learning framework. On the WHU-TLC dataset, where the images are captured with a three-line camera (TLC), the proposed framework outperforms all the other solutions in terms of reconstruction fineness, and also outperforms most of the other methods in terms of efficiency. On the challenging MVS3D dataset, where the images are captured by the WorldView-3 satellite at different times and seasons, the proposed framework also exceeds the existing methods when using the model pretrained on aerial images and the introduced self-refinement strategy, demonstrating a high generalization ability. We also note that the lack of training samples hinders research in this field, and the availability of more high-quality open-source training data will greatly accelerate the research into deep learning based MVS satellite image reconstruction. The code will be available at https://gpcv.whu.edu.cn/data.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available