4.7 Article

VidSfM: Robust and Accurate Structure-From-Motion for Monocular Videos

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING
Volume 31, Issue -, Pages 2449-2462

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIP.2022.3156375

Keywords

Cameras; Image reconstruction; Videos; Simultaneous localization and mapping; Video sequences; Robustness; Scalability; Structure from motion; image reconstruction; computational geometry; computer vision

Funding

  1. National Natural Science Foundation of China [62073320, U1805264, 61873265]
  2. Didi Gaia Foundation

Ask authors/readers for more resources

With the popularization of smartphones, more high-quality videos are available, leading to an increase in the scale of scene reconstruction. A tailor-made framework is proposed to solve the problems caused by high-resolution and high frame rate videos, aiming to achieve accurate and robust structure-from-motion based on monocular videos. The key ideas include utilizing the spatial and temporal continuity of video sequences for improved reconstruction accuracy and robustness, as well as leveraging the redundancy of video sequences to enhance efficiency and scalability. The system is able to integrate data from different video sequences for simultaneous reconstruction.
With the popularization of smartphones, larger collection of videos with high quality is available, which makes the scale of scene reconstruction increase dramatically. However, high-resolution video produces more match outliers, and high frame rate video brings more redundant images. To solve these problems, a tailor-made framework is proposed to realize an accurate and robust structure-from-motion based on monocular videos. The key ideas include two points: one is to use the spatial and temporal continuity of video sequences to improve the accuracy and robustness of reconstruction; the other is to use the redundancy of video sequences to improve the efficiency and scalability of system. Our technical contributions include an adaptive way to identify accurate loop matching pairs, a cluster-based camera registration algorithm, a local rotation averaging scheme to verify the pose estimate and a local images extension strategy to reboot the incremental reconstruction. In addition, our system can integrate data from different video sequences, allowing multiple videos to be simultaneously reconstructed. Extensive experiments on both indoor and outdoor monocular videos demonstrate that our method outperforms the state-of-the-art approaches in robustness, accuracy and scalability.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available