4.6 Article

Scene Coordinate Regression Network With Global Context-Guided Spatial Feature Transformation for Visual Relocalization

期刊

IEEE ROBOTICS AND AUTOMATION LETTERS
卷 6, 期 3, 页码 5737-5744

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/LRA.2021.3082473

关键词

Scene coordinate regression network; global context; spatial feature transformation; visual relocalization

类别

资金

  1. National Natural Science Foundation of China [62073322, 61633020, 61836015, 61633017]

向作者/读者索取更多资源

In this letter, a global context-guided spatial feature transformation (SFT) network is proposed to learn invariant feature representation for visual relocalization from a single RGB image, achieving robustness against viewpoint changes. By predicting transformation parameters and transforming features to a canonical space, viewpoint invariance is achieved, with further improvement in feature discrimination on texture-less or repetitive zones. The experimental results demonstrate the effectiveness of the proposed method in terms of accuracy and efficiency.
Among visual relocalization from a single RGB image, the scene coordinate regression (SCoRe) based on convolutional neural network (CNN) becomes prevailing, however, it is insufficient to extract invariant features under different viewpoints due to fixed geometric structures of CNN. In this letter, we propose a global context-guided spatial feature transformation (SFT) network to learn invariant feature representation for robustness against viewpoint changes. Specifically, global feature extracted from source feature map is regarded as a dynamic convolutional kernel, which is convolved with source feature map for the prediction of transformation parameters. The predicted parameters are used to transform features of multiple viewpoints to a canonical space with the constraint of maximum likelihood-derived loss, and thus viewpoint invariance is achieved. CoordConv is also employed to further improve the discrimination of features on texture-less or repetitive zones. The proposed SFT network can be easily incorporated into the general SCoRe network. To our best knowledge, features are first decoupled from viewpoints explicitly in SCoRe network by the spatial feature transformation network, which achieves a stable and accurate visual relocalization. The experimental results demonstrate the effectiveness of the proposed method in terms of accuracy and efficiency.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据