4.7 Article

SwinSUNet: Pure Transformer Network for Remote Sensing Image Change Detection

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TGRS.2022.3160007

Keywords

Transformers; Task analysis; Feature extraction; Merging; Convolution; Decoding; Semantics; Change detection (CD); deep learning; remote sensing image; transformer

Funding

  1. Tianshan Innovation Team of Xinjiang Uygur Autonomous Region [2020D14044]
  2. National Science Foundation of China [U1903213]

Ask authors/readers for more resources

This article presents a pure Transformer network called SwinSUNet for remote sensing image change detection. SwinSUNet utilizes the global information extraction ability of Transformers and employs an encoder, fusion module, and decoder to achieve change detection and localization.
Convolutional neural network (CNN) can extract effective semantic features, so it was widely used for remote sensing image change detection (CD) in the latest years. CNN has acquired great achievements in the field of CD, but due to the intrinsic locality of convolution operation, it could not capture global information in space-time. The transformer was proposed in recent years and it can effectively extract global information, so it was used to solve computer vision (CV) tasks and achieved amazing success. In this article, we design a pure transformer network with Siamese U-shaped structure to solve CD problems and name it SwinSUNet. SwinSUNet contains encoder, fusion, and decoder, and all of them use Swin transformer blocks as basic units. Encoder has a Siamese structure based on hierarchical Swin transformer, so encoder can process bitemporal images in parallel and extract their multiscale features. Fusion is mainly responsible for the merge operation of the bitemporal features generated by the encoder. Like encoder, the decoder is also based on hierarchical Swin transformer. Different from the encoder, the decoder uses upsampling and merging (UM) block and Swin transformer blocks to recover the details of the change information. The encoder uses patch merging and Swin transformer blocks to generate effective semantic features. After the sequential process of these three modules, SwinSUNet will output the change maps. We did expensive experiments on four CD datasets, and in these experiments, SwinSUNet achieved better results than other related methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available