☆ 4.7 Article

Deformable convolutions in multi-view stereo

IMAGE AND VISION COMPUTING (2022)

Journal

IMAGE AND VISION COMPUTING

Volume 118, Issue -, Pages -

Publisher

ELSEVIER

DOI: 10.1016/j.imavis.2021.104369

Keywords

Multi-view stereo; Depth map; Deep learning

Funding

Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior - Brasil (CAPES) [001]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Multi-View Stereo (MVS) is a critical step in photogrammetry, relying on the ability to match features in different images. Convolutional Neural Networks have been used to solve this problem, but they consume a large amount of Video RAM. This study reduces GPU memory usage and introduces deformable convolutions to improve the performance.

The Multi-View Stereo (MVS) is a key process in the photogrammetry workflow. It is responsible for taking the camera's views and finding the maximum number of matches between the images yielding a dense point cloud of the observed scene. Since this process is based on the matching between images it greatly depends on the abil-ity of features matching throughout different images. To improve the matching performance several researchers have proposed the use of Convolutional Neural Networks (CNNs) to solve the MVS problem. Despite the progress in the MVS problem with the usage of CNNs, the Video RAM (VRAM) consumption within these approaches is usually far greater than classical methods, that rely more on RAM, which is cheaper to expand than VRAM. This work then follows the progress made in CasMVSNet in the reduction of GPU memory usage, and further study the changes in the feature extraction process. The Average Group-wise Correlation is used in the cost vol-ume generation, to reduce the number of channels in the cost volume, yielding a reduction in GPU memory usage without noticeable penalties in the result. The deformable convolutions are applied in the feature extraction net -work to augment the spatial sampling locations with learning offsets, without additional supervision, to further improve the network's ability to model transformations. The impact of these changes is measured using quanti-tative and qualitative tests using the DTU and the Tanks and Temples datasets. The modifications reduced the GPU memory usage by 32% and improved the completeness by 9% with a penalty of 6.6% in accuracy on the DTU dataset.(c) 2021 Published by Elsevier B.V.

Deformable convolutions in multi-view stereo

Journal

IMAGE AND VISION COMPUTING

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Deformable convolutions in multi-view stereo

Journal

IMAGE AND VISION COMPUTING

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper