4.7 Article

Complementarity-aware cross-modal feature fusion network for RGB-T semantic segmentation

Journal

PATTERN RECOGNITION
Volume 131, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2022.108881

Keywords

RGB-T; Cross-modal fusion; Multi-supervision; Semantic segmentation

Funding

  1. General Project of the National Natural Science Foundation of China [61976094]
  2. Natural Science Foundation of Guangdong Province [2021A1515011349]

Ask authors/readers for more resources

RGB-T semantic segmentation has gained attention for its robustness under challenging illumination. This paper proposes a Complementarity-aware Cross-modal Feature Fusion Network (CCFFNet) that selects and fuses complementary information from RGB and thermal features. Experimental results show that the proposed model outperforms state-of-the-art models and can be easily applied to multi-modal semantic segmentation.
RGB-T semantic segmentation has attracted growing attention because it makes a model robust towards challenging illumination. Most existing methods fuse RGB and thermal information in an equal manner along spatial dimensions, which results in feature redundancy and affects the discriminability of cross -modal features. In this paper, we propose a Complementarity-aware Cross-modal Feature Fusion Network (CCFFNet) including a Complementarity-Aware Encoder (CAE) and a Three-Path Fusion and Supervision (TPFS). The CAE, which consists of cascaded cross-modal fusion modules, can select complementary in-formation from RGB and thermal features via a novel gate and fuse them by a channel-wise weighting mechanism. TPFS not only iteratively performs Three-Path Fusion (TPF) to further enhance cross-modal features, but also supervise the training of CCFFNet along three branches by Three-Supervision (TS). Ex-tensive experiments are carried out and the results demonstrate that our model outperforms the state-of-the-art models by at least 1.6% mIoU on MFNet dataset and 2.9% mIoU on PST900 dataset, respectively. And a single-modality-based model can be easily applied to multi-modal semantic segmentation when plugging our CAE.(c) 2022 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available