4.7 Article

NDNet: Spacewise Multiscale Representation Learning via Neighbor Decoupling for Real-Time Driving Scene Parsing

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2022.3221745

Keywords

Autonomous driving; neighbor decoupling (ND); real-time system; scene parsing; semantic segmentation

Funding

  1. National Key Research and Development Program of China [2020AAA0108100]
  2. National Natural Science Foundation ofChina [61733013]

Ask authors/readers for more resources

This article proposes a real-time driving scene parsing framework called NDNet, which utilizes the spacewise neighbor decoupling (ND) and neighbor coupling (NC) methods to achieve high-quality semantic segmentation and real-time performance. Experimental results demonstrate the superiority of this method compared to others on the Cityscapes dataset.
As a safety-critical application, autonomous driving requires high-quality semantic segmentation and real-time performance for deployment. Existing method commonly suffers from information loss and massive computational burden due to high-resolution input-output and multiscale learning scheme, which runs counter to the real-time requirements. In contrast to channelwise information modeling commonly adopted by modern networks, in this article, we propose a novel real-time driving scene parsing framework named NDNet from a novel perspective of spacewise neighbor decoupling (ND) and neighbor coupling (NC). We first define and implement the reversible operations called ND and NC, which realize lossless resolution conversion for complementary thumbnails sampling and collation to facilitate spatial modeling. Based on ND and NC, we further propose three modules, namely, local capturer and global dependence builder (LCGB), spacewise multiscale feature extractor (SMFE), and high-resolution semantic generator (HSG), which form the whole pipeline of NDNet. The LCGB serves as a stem block to preprocess the large-scale input for fast but lossless resolution reduction and extract initial features with global context. Then the SMFE is used for dense feature extraction and can obtain rich multiscale features in spatial dimension with less computational overhead. As for high-resolution semantic output, the HSG is designed for fast resolution reconstruction and adaptive semantic confusion amending. Experiments show the superiority of the proposed method. NDNet achieves the state-of-the-art performance on the Cityscapes dataset which reports 76.47% mIoU at 240+ frames/s and 78.8% mIoU at 150+ frames/s on the benchmark. Codes are available at https://github.com/LiShuTJ/NDNet.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available