4.7 Article

MFFENet: Multiscale Feature Fusion and Enhancement Network For RGB-Thermal Urban Road Scene Parsing

Journal

IEEE TRANSACTIONS ON MULTIMEDIA
Volume 24, Issue -, Pages 2526-2538

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2021.3086618

Keywords

Hafnium; Autonomous driving; urban road scene; semantic segmentation; computer vision; thermal image; multi-task supervision

Funding

  1. National Natural Science Foundation of China [61502429, 61972357]
  2. Zhejiang Provincial Natural Science Foundation of China [LY18F020012]

Ask authors/readers for more resources

In this paper, a novel multiscale feature fusion and enhancement network (MFFENet) is proposed for accurate parsing of RGB-thermal urban road scenes. By incorporating multi-label supervision and a spatial attention mechanism module, the MFFENet outperforms similar high-performing methods and emphasizes foreground objects in the scene.
Compared with traditional handcrafted features, deep learning has greatly improved the performance of scene parsing. However, it remains challenging under various environmental conditions caused by imaging limitations. Thermal imaging cameras have several advantages over cameras for the visible spectrum, such as operation in total darkness, robustness to shadow effects, insensitivity to illumination variations, and strong ability to penetrate smog and haze. These advantages of thermal imaging cameras make them ideal for the scene parsing of semantic objects in daytime and nighttime. In this paper, we propose a novel multiscale feature fusion and enhancement network (MFFENet) for accurate parsing of RGB-thermal urban road scenes even when the quality of the available RGB data is compromised. The proposed MFFENet consists of two encoders, a feature fusion layer, and a multi-label supervision layer. We concatenate the multi-scale features with the features that contain global semantic information. Furthermore, we explore the cross-modal fusion of RGB and thermal features at multiple stages, rather than fusing them once at the low or high stage. Then, we propose a spatial attention mechanism module that provides a higher weight to (focuses more on) the foreground area, allowing MFFENet to emphasize foreground objects. Finally, multi-label supervision is introduced to optimize parameters of the proposed MFFENet. Experimental results confirm that the proposed MFFENet outperforms similar high-performing methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available