☆ 4.7 Article

Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2022)

期刊

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

卷 32, 期 4, 页码 2091-2106

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCSVT.2021.3082939

关键词

Dynamic cross-modal guided mechanism; RGB-D/RGB-T multi-modal data; information fusion; salient object detection

类别

Engineering, Electrical & Electronic

资金

Ministry of Science and Technology of China-Science and Technology Innovations 2030 [2019AAA0103501]
Natural Science Foundation of China [61801303, 62031013]
Guangdong Basic and Applied Basic Research Foundation [2019A1515012031]
Shenzhen Science and Technology Plan Basic Research Project [JCYJ20190808161805519]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The use of complementary information, such as depth or thermal information, has shown benefits in salient object detection. However, the current approaches for RGB-D or RGB-T salient object detection problems are solved independently and they directly extract and fuse raw features. This work proposes a unified end-to-end framework that can simultaneously analyze RCB-D and RGB-T salient object detection tasks, and it introduces a multi-stage and multi-scale fusion network to effectively handle multi-modal features.

The use of complementary information, namely depth or thermal information, has shown its benefits to salient object detection (SOD) during recent years. However, the RGB-D or RGB-T SOD problems are currently only solved independently, and most of them directly extract and fuse raw features from backbones. Such methods can he easily restricted by low-quality modality data and redundant cross-modal features. In this work, a unified end-to-end framework is designed to simultaneously analyze RCB-D and RGB-T SOD tasks. Specifically, to effectively tackle multi-modal features, we propose a novel multi-stage and multi-scale fusion network (MMNet), which consists of a cross-modal multi-stage fusion module (CMFM) and a bi-directional multi-scale decoder (BMD). Similar to the visual color stage doctrine in the human visual system (HVS), the proposed CMFM aims to explore important feature representations in feature response stage, and integrate them into cross-modal features in adversarial combination stage. Moreover, the proposed BMD learns the combination of multilevel cross-modal fused features to capture both local and global information of salient objects, and can further boost the multimodal SOD performance. The proposed unified cross-modality feature analysis framework based on two-stage and multi-scale information fusion can be used for diverse multi-modal SOD tasks. Comprehensive experiments (similar to 92K image-pairs) demonstrate that the proposed method consistently outperforms the other 21 state-of-the-art methods on nine benchmark datasets. This validates that our proposed method can work well on diverse multi-modal SOD tasks with good generalization and robustness, and provides a good multi-modal SOD benchmark.

Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection

期刊

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection

期刊

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文