☆ 4.7 Article

Interactive Multi-Scale Fusion of 2D and 3D Features for Multi-Object Vehicle Tracking

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS (2023)

Journal

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Volume -, Issue -, Pages -

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TITS.2023.3275954

Keywords

Multi object tracking; 3D point clouds; feature fusion; computer vision; deep learning

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Multiple Object Tracking (MOT) is an important task in autonomous driving, but relying on a single sensor is not robust enough. Texture information from RGB cameras and 3D structure information from LiDAR have their own advantages in different situations. Therefore, feature fusion from multiple modalities is beneficial for learning discriminative features. However, achieving effective feature fusion is challenging due to the distinct information modalities.

Multiple Object Tracking (MOT) is a significant task in autonomous driving. Nonetheless, relying on one single sensor is not robust enough, because one modality tends to fail in some challenging situations. Texture information from RGB cameras and 3D structure information from Light Detection and Ranging (LiDAR) have respective advantages under different circumstances. Therefore, feature fusion from multiple modalities contributes to the learning of discriminative features. However, it is nontrivial to achieve effective feature fusion due to the completely distinct information modality. Previous fusion methods usually fuse the top-level features after the backbones extract the features from different modalities. The feature fusion happens solely once, which limits the information interaction between different modalities. In this paper, we propose multiscale interactive query and fusion between pixel-wise and point-wise features to obtain more discriminative features. In addition, an attention mechanism is utilized to conduct soft feature fusion between multiple pixels and points to avoid inaccurate match problems of previous single pixel-point fusion methods. We introduce PointNet++ to obtain multi-scale deep representations of point clouds and make it adaptive to our proposed interactive feature fusion between multi-scale features of images and point clouds. Through the interaction module, each modality can integrate more complementary information from the other modality. Besides, we explore the effectiveness of pre-training on each single modality and fine-tuning on the fusion-based model. Our method can achieve 90.32% MOTA and 72.44% HOTA on the KITTI benchmark and outperform other approaches without using multi-scale soft feature fusion.

Interactive Multi-Scale Fusion of 2D and 3D Features for Multi-Object Vehicle Tracking

Journal

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Interactive Multi-Scale Fusion of 2D and 3D Features for Multi-Object Vehicle Tracking

Journal

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper