4.7 Article

Improved Swin Transformer-Based Semantic Segmentation of Postearthquake Dense Buildings in Urban Areas Using Remote Sensing Images

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JSTARS.2022.3225150

Keywords

Attention mechanism; complex weather disturbances; dense seismic building segmentation; feature fusion; improved Swin Transformer; remote sensing images

Ask authors/readers for more resources

This paper proposes an improved Swin Transformer model for segmenting dense urban buildings from remote sensing images with complex backgrounds. A convolutional block attention module is utilized to focus on significant features, and hierarchical feature maps are fused to enhance the feature extraction process. The effectiveness and superiority of the proposed method are validated through ablation experiments and comparative studies, achieving an improvement of 1.3% in mean intersection-over-union compared to the original model.
Timely acquiring the earthquake-induced damage of buildings is crucial for emergency assessment and post-disaster rescue. Optical remote sensing is a typical method for obtaining seismic data due to its wide coverage and fast response speed. Convolutional neural networks (CNNs) are widely applied for remote sensing image recognition. However, insufficient extraction and expression ability of global correlations between local image patches limit the performance of dense building segmentation. This paper proposes an improved Swin Transformer to segment dense urban buildings from remote sensing images with complex backgrounds. The original Swin Transformer is used as a backbone of the encoder, and a convolutional block attention module is employed in the linear embedding and patch merging stages to focus on significant features. Hierarchical feature maps are then fused to strengthen the feature extraction process and fed into the UPerNet (as the decoder) to obtain the final segmentation map. Collapsed and non-collapsed buildings are labeled from remote sensing images of the Yushu and Beichuan earthquakes. Data augmentations of horizontal and vertical flipping, brightness adjustment, uniform fogging, and non-uniform fogging are performed to simulate actual situations. The effectiveness and superiority of the proposed method over the original Swin Transformer and several mature CNN-based segmentation models are validated by ablation experiments and comparative studies. The results show that the mean intersection-over-union of the improved Swin Transformer reaches 88.53%, achieving an improvement of 1.3% compared to the original model. The stability, robustness, and generalization ability of dense building recognition under complex weather disturbances are also validated.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available