☆ 4.7 Article

A Swin Transformer-Based Encoding Booster Integrated in U-Shaped Network for Building Extraction

REMOTE SENSING (2022)

Journal

REMOTE SENSING

Volume 14, Issue 11, Pages -

Publisher

MDPI

DOI: 10.3390/rs14112611

Keywords

building extraction; deep learning; U-shaped network; swin Transformer; encoding booster; self-attention; semantic information

Funding

NSFC [61901341, 61403291]
China Postdoctoral Science Foundation [2021TQ0260]
National Natural Science Foundation of Shaanxi Province [2020JQ-301]
GHfund [202107020822, 202202022633]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper proposes a shifted-window (swin) Transformer-based encoding booster for efficient extraction of building areas in remote sensing images. By integrating the encoding booster in a specially designed U-shaped network, the feature-level fusion of local and large-scale semantics is achieved. Experimental results demonstrate that the proposed method achieves higher accuracy in extracting buildings of different scales compared to state-of-the-art networks.

Building extraction is a popular topic in remote sensing image processing. Efficient building extraction algorithms can identify and segment building areas to provide informative data for downstream tasks. Currently, building extraction is mainly achieved by deep convolutional neural networks (CNNs) based on the U-shaped encoder-decoder architecture. However, the local perceptive field of the convolutional operation poses a challenge for CNNs to fully capture the semantic information of large buildings, especially in high-resolution remote sensing images. Considering the recent success of the Transformer in computer vision tasks, in this paper, first we propose a shifted-window (swin) Transformer-based encoding booster. The proposed encoding booster includes a swin Transformer pyramid containing patch merging layers for down-sampling, which enables our encoding booster to extract semantics from multi-level features at different scales. Most importantly, the receptive field is significantly expanded by the global self-attention mechanism of the swin Transformer, allowing the encoding booster to capture the large-scale semantic information effectively and transcend the limitations of CNNs. Furthermore, we integrate the encoding booster in a specially designed U-shaped network through a novel manner, named the Swin Transformer-based Encoding Booster- U-shaped Network (STEB-UNet), to achieve the feature-level fusion of local and large-scale semantics. Remarkably, compared with other Transformer-included networks, the computational complexity and memory requirement of the STEB-UNet are significantly reduced due to the swin design, making the network training much easier. Experimental results show that the STEB-UNet can effectively discriminate and extract buildings of different scales and demonstrate higher accuracy than the state-of-the-art networks on public datasets.

A Swin Transformer-Based Encoding Booster Integrated in U-Shaped Network for Building Extraction

Journal

REMOTE SENSING

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A Swin Transformer-Based Encoding Booster Integrated in U-Shaped Network for Building Extraction

Journal

REMOTE SENSING

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper