4.5 Article

AGCosPlace: A UAV Visual Positioning Algorithm Based on Transformer

期刊

DRONES
卷 7, 期 8, 页码 -

出版社

MDPI
DOI: 10.3390/drones7080498

关键词

UAV visual navigation; visual positioning; graph network; transformer

向作者/读者索取更多资源

This paper proposes a visual positioning algorithm called AGCosPlace based on image retrieval, which leverages the Transformer architecture to improve performance and overcome the limitations of unknown relative poses and intrinsics of drone cameras. The algorithm involves encoding the feature map of the backbone with attention mechanisms, multi-layer perceptron coding, and a graph network module for better contextual information aggregation. Experimental results demonstrate that the proposed algorithm achieves notable improvements in four evaluation metrics and can effectively be used for UAV visual positioning tasks.
To address the limitation and obtain the position of the drone even when the relative poses and intrinsics of the drone camera are unknown, a visual positioning algorithm based on image retrieval called AGCosPlace, which leverages the Transformer architecture to achieve improved performance, is proposed. Our approach involves subjecting the feature map of the backbone to an encoding operation that incorporates attention mechanisms, multi-layer perceptron coding, and a graph network module. This encoding operation allows for better aggregation of the context information present in the image. Subsequently, the aggregation module with dynamic adaptive pooling produces a descriptor with an appropriate dimensionality, which is then passed into the classifier to recognize the position. Considering the complexity associated with labeling visual positioning labels for UAV images, the visual positioning network is trained using the publicly available Google Street View SF-XL dataset. The performance of the trained network model on a custom UAV perspective test set is evaluated. The experimental results demonstrate that our proposed algorithm, which improves upon the ResNet backbone networks on the SF-XL test set, exhibits excellent performance on the UAV test set. The algorithm achieves notable improvements in the four evaluation metrics: R@1, R@5, R@10, and R@20. These results confirm that the trained visual positioning network can effectively be employed in UAV visual positioning tasks.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据