4.7 Article

Region-adaptive and context-complementary cross modulation for RGB-T semantic segmentation

期刊

PATTERN RECOGNITION
卷 147, 期 -, 页码 -

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2023.110092

关键词

RGB-Thermal; Semantic segmentation; Region-Adaptive Channel Modulation; Context-Complementary Spatial Modulation

向作者/读者索取更多资源

RGB-Thermal (RGB-T) semantic segmentation is an emerging task that aims to improve the robustness of segmentation methods under extreme imaging conditions by using thermal infrared modality. The challenges of foreground-background distinguishment and complementary information mining are addressed by proposing a cross modulation process with two collaborative components. Experimental results show that the proposed method achieves state-of-the-art performances on current RGB-T segmentation benchmarks.
RGB-Thermal (RGB-T) semantic segmentation is an emerging task aiming to improve the robustness of seg-mentation methods under extreme imaging conditions with the aid of thermal infrared modality. Foreground- background distinguishment and complementary information mining are two key challenges of this task. Recent methods use naive channel attention and cross-attention to tackle these challenges, but they still struggle with a sub-optimal solution where salient foreground features and noisy background ones might be equally modulated without distinction. The quadratic computational overhead of cross-attention also blocks its application on high-resolution features. Moreover, lacking complementary information mining in the encoding phase hinders the comprehensive scene encoding as well. To alleviate these limitations, we propose a cross modulation process with two collaborative components. The first Region-Adaptive Channel Modulation (RACM) module conducts channel attention at a fine-grained region level where foreground and background regions can be modulated differently in each channel. The second Context-Complementary Spatial Modulation (CCSM) module mines and transfers complementary information between the two modalities early in the encoding phase. Experiments show that our method achieves state-of-the-art performances on current RGB-T segmentation benchmarks.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据