4.7 Article

ConvMLP-Mixer Based Real-Time Stereo Matching Network Towards Autonomous Driving

Journal

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY
Volume 72, Issue 2, Pages 2581-2586

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TVT.2022.3206612

Keywords

Autonomous driving; convMLP-mixer; group-wise correlation; patch embeddings; stereo matching

Ask authors/readers for more resources

This paper proposes CMNet, a lightweight stereo matching architecture that improves the trade-off between speed and accuracy in resource-limited devices. It introduces a novel feature extraction network consisting of a patch embedding layer and a ConvMLP-mixer to enhance feature vectors and increase accuracy of the disparity map. The absolute difference volume is concatenated with the group-wise correlation volume to provide multi-dimensional matching cost information for cost aggregation. CMNet achieves state-of-the-art results in terms of speed and accuracy on the KITTI 2012 and KITTI 2015 stereo matching datasets.
Stereo matching is a classical problem in computer vision. It has been widely used in many fields, especially autonomous driving in recent years. Two key aspects of speed and accuracy are both desirable but conflicting characteristics in autonomous driving. In this paper, we present CMNet, a lightweight stereo matching architecture for improving the trade-off between speed and accuracy on resource-limited devices. A novel feature extraction network consisted of a patch embedding layer and a ConvMLP-mixer is proposed. The patch embedding layer enhances the receptive field and makes the feature vectors compact. The accuracy of the disparity map is increased by mixing the spatial information in the channel dimension through the ConvMLP-mixer. The absolute difference volume is concatenated with the group-wise correlation volume to provide multi-dimensional matching cost information for the cost aggregation stage. Being evaluated on KITTI 2012 and KITTI 2015 stereo matching datasets, the inference time of CMNet on NVIDIA GTX 2080ti GPU is 8.7 ms. While realizing fast predictions beyond real-time, the results of D1-all are 3.41% on KITTI 2012 and 3.84% on KITTI 2015, achieving state-of-the-art result between speed and accuracy. Besides, the lightweight architecture of CMNet enables a fast inference time of 40.7 ms on Nvidia Jetson Nano to realize real-time applications on edge devices.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available