☆ 4.7 Article

Cross-Collaborative Fusion-Encoder Network for Robust RGB-Thermal Salient Object Detection

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2022)

期刊

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

卷 32, 期 11, 页码 7646-7661

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCSVT.2022.3184840

关键词

Feature extraction; Encoding; Decoding; Saliency detection; Object detection; Noise measurement; Task analysis; RGB-thermal salient object detection; fusion-encoder; cross-collaborative

类别

Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article proposes a novel CCFENet model for RGB-T salient object detection. The model addresses the issue of defective modalities using a cross-collaboration enhancement strategy (CCE) and aggregates multi-level complementary multi-modal features using a cross-scale cross-modal decoder (CCD). Experimental results demonstrate that CCFENet outperforms existing models on multiple datasets.

With the prevalence of thermal cameras, RGB-T multi-modal data have become more available for salient object detection (SOD) in complex scenes. Most RGB-T SOD works first individually extract RGB and thermal features from two separate encoders and directly integrate them, which pay less attention to the issue of defective modalities. However, such an indiscriminate feature extraction strategy may produce contaminated features and thus lead to poor SOD performance. To address this issue, we propose a novel CCFENet for a perspective to perform robust and accurate multi-modal expression encoding. First, we propose an essential cross-collaboration enhancement strategy (CCE), which concentrates on facilitating the interactions across the encoders and encouraging different modalities to complement each other during encoding. Such a cross-collaborative-encoder paradigm induces our network to collaboratively suppress the negative feature responses of defective modality data and effectively exploit modality-informative features. Moreover, as the network goes deeper, we embed several CCEs into the encoder, further enabling more representative and robust feature generation. Second, benefiting from the proposed robust encoding paradigm, a simple yet effective cross-scale cross-modal decoder (CCD) is designed to aggregate multi-level complementary multi-modal features, and thus encourages efficient and accurate RGB-T SOD. Extensive experiments reveal that our CCFENet outperforms the state-of-the-art models on three RGB-T datasets with a fast inference speed of 62 FPS. In addition, the advantages of our approach in complex scenarios (e.g., bad weather, motion blur, etc.) and RGB-D SOD further verify its robustness and generality. The source code will be publicly available via our project page: https://git.openi.org.cn/OpenVision/CCFENet.

Cross-Collaborative Fusion-Encoder Network for Robust RGB-Thermal Salient Object Detection

期刊

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Cross-Collaborative Fusion-Encoder Network for Robust RGB-Thermal Salient Object Detection

期刊

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文