期刊
2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV
卷 -, 期 -, 页码 155-165出版社
IEEE
DOI: 10.1109/3DV57658.2022.00028
关键词
-
类别
资金
- Toyota Research Institute (TRI)
We propose a simple baseline for estimating the relative pose between two images, which can directly compute the rotation, translation, and scale. By making a few modifications to the Vision Transformer (ViT), we are able to achieve results close to the Eight-Point Algorithm. This approach provides a straightforward method that is highly competitive in various scenarios, especially in cases with limited data.
We present a simple baseline for directly estimating the relative pose (rotation and translation, including scale) between two images. Deep methods have recently shown strong progress but often require complex or multi-stage architectures. We show that a handful of modifications can be applied to a Vision Transformer (ViT) to bring its computations close to the Eight-Point Algorithm. This inductive bias enables a simple method to be competitive in multiple settings, often substantially improving over the state of the art with strong performance gains in limited data regimes.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据