☆ 4.7 Article

MFFENet: Multiscale Feature Fusion and Enhancement Network For RGB-Thermal Urban Road Scene Parsing

IEEE TRANSACTIONS ON MULTIMEDIA (2022)

Journal

IEEE TRANSACTIONS ON MULTIMEDIA

Volume 24, Issue -, Pages 2526-2538

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TMM.2021.3086618

Keywords

Hafnium; Autonomous driving; urban road scene; semantic segmentation; computer vision; thermal image; multi-task supervision

Funding

National Natural Science Foundation of China [61502429, 61972357]
Zhejiang Provincial Natural Science Foundation of China [LY18F020012]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

In this paper, a novel multiscale feature fusion and enhancement network (MFFENet) is proposed for accurate parsing of RGB-thermal urban road scenes. By incorporating multi-label supervision and a spatial attention mechanism module, the MFFENet outperforms similar high-performing methods and emphasizes foreground objects in the scene.

Compared with traditional handcrafted features, deep learning has greatly improved the performance of scene parsing. However, it remains challenging under various environmental conditions caused by imaging limitations. Thermal imaging cameras have several advantages over cameras for the visible spectrum, such as operation in total darkness, robustness to shadow effects, insensitivity to illumination variations, and strong ability to penetrate smog and haze. These advantages of thermal imaging cameras make them ideal for the scene parsing of semantic objects in daytime and nighttime. In this paper, we propose a novel multiscale feature fusion and enhancement network (MFFENet) for accurate parsing of RGB-thermal urban road scenes even when the quality of the available RGB data is compromised. The proposed MFFENet consists of two encoders, a feature fusion layer, and a multi-label supervision layer. We concatenate the multi-scale features with the features that contain global semantic information. Furthermore, we explore the cross-modal fusion of RGB and thermal features at multiple stages, rather than fusing them once at the low or high stage. Then, we propose a spatial attention mechanism module that provides a higher weight to (focuses more on) the foreground area, allowing MFFENet to emphasize foreground objects. Finally, multi-label supervision is introduced to optimize parameters of the proposed MFFENet. Experimental results confirm that the proposed MFFENet outperforms similar high-performing methods.

MFFENet: Multiscale Feature Fusion and Enhancement Network For RGB-Thermal Urban Road Scene Parsing

Journal

IEEE TRANSACTIONS ON MULTIMEDIA

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

MFFENet: Multiscale Feature Fusion and Enhancement Network For RGB-Thermal Urban Road Scene Parsing

Journal

IEEE TRANSACTIONS ON MULTIMEDIA

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper