4.8 Article

TransFuser: Imitation With Transformer-Based Sensor Fusion for Autonomous Driving

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2022.3200245

关键词

Laser radar; Transformers; Three-dimensional displays; Semantics; Sensor fusion; Cameras; Autonomous vehicles; Attention; autonomous driving; imitation learning; sensor fusion; transformers

向作者/读者索取更多资源

In this study, a method called TransFuser is proposed to integrate representations from both images and LiDAR. By using self-attention mechanism to fuse feature maps at different resolutions, the method achieves better performance in complex driving scenarios.
How should we integrate representations from complementary sensors for autonomous driving? Geometry-based fusion has shown promise for perception (e.g., object detection, motion forecasting). However, in the context of end-to-end driving, we find that imitation learning based on existing sensor fusion methods underperforms in complex driving scenarios with a high density of dynamic agents. Therefore, we propose TransFuser, a mechanism to integrate image and LiDAR representations using self-attention. Our approach uses transformer modules at multiple resolutions to fuse perspective view and bird's eye view feature maps. We experimentally validate its efficacy on a challenging new benchmark with long routes and dense traffic, as well as the official leaderboard of the CARLA urban driving simulator. At the time of submission, TransFuser outperforms all prior work on the CARLA leaderboard in terms of driving score by a large margin. Compared to geometry-based fusion, TransFuser reduces the average collisions per kilometer by 48%.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据