4.7 Article

SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer

Journal

IEEE-CAA JOURNAL OF AUTOMATICA SINICA
Volume 9, Issue 7, Pages 1200-1217

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JAS.2022.105686

Keywords

Cross-domain long-range learning; image fusion; Swin transformer

Funding

  1. National Natural Science Foundation of China [62075169, 62003247, 62061160370]
  2. Key Research and Development Program of Hubei Province [2020BAB113]

Ask authors/readers for more resources

This study proposes a novel image fusion framework called SwinFusion, which combines cross-domain long-range learning and Swin Transformer. The framework integrates complementary information and achieves global interaction through attention-guided cross-domain modules. It also addresses multi-scene image fusion problems by preserving structure, detail, and intensity. Extensive experiments prove the superiority of SwinFusion compared to other state-of-the-art fusion algorithms. The implementation code and pre-trained weights are available at https://github.com/Linfeng-Tang/SwinFusion.
This study proposes a novel general image fusion framework based on cross-domain long-range learning and Swin Transformer, termed as SwinFusion. On the one hand, an attention-guided cross-domain module is devised to achieve sufficient integration of complementary information and global interaction. More specifically, the proposed method involves an intra-domain fusion unit based on self-attention and an inter-domain fusion unit based on cross-attention, which mine and integrate long dependencies within the same domain and across domains. Through long-range dependency modeling, the network is able to fully implement domain-specific information extraction and cross-domain complementary information integration as well as maintaining the appropriate apparent intensity from a global perspective. In particular, we introduce the shifted windows mechanism into the self-attention and cross-attention, which allows our model to receive images with arbitrary sizes. On the other hand, the multi-scene image fusion problems are generalized to a unified framework with structure maintenance, detail preservation, and proper intensity control. Moreover, an elaborate loss function, consisting of SSIM loss, texture loss, and intensity loss, drives the network to preserve abundant texture details and structural information, as well as presenting optimal apparent intensity. Extensive experiments on both multi-modal image fusion and digital photography image fusion demonstrate the superiority of our SwinFusion compared to the state-of-the-art unified image fusion algorithms and task-specific alternatives. Implementation code and pre-trained weights can be accessed at https://github.com/Linfeng-Tang/SwinFusion.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available