☆ 4.7 Article

A Swin Transformer-Based Encoding Booster Integrated in U-Shaped Network for Building Extraction

REMOTE SENSING (2022)

期刊

REMOTE SENSING

卷 14, 期 11, 页码 -

出版社

MDPI

DOI: 10.3390/rs14112611

关键词

building extraction; deep learning; U-shaped network; swin Transformer; encoding booster; self-attention; semantic information

类别

Environmental Sciences Geosciences, Multidisciplinary Remote Sensing Imaging Science & Photographic Technology

资金

NSFC [61901341, 61403291]
China Postdoctoral Science Foundation [2021TQ0260]
National Natural Science Foundation of Shaanxi Province [2020JQ-301]
GHfund [202107020822, 202202022633]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a shifted-window (swin) Transformer-based encoding booster for efficient extraction of building areas in remote sensing images. By integrating the encoding booster in a specially designed U-shaped network, the feature-level fusion of local and large-scale semantics is achieved. Experimental results demonstrate that the proposed method achieves higher accuracy in extracting buildings of different scales compared to state-of-the-art networks.

Building extraction is a popular topic in remote sensing image processing. Efficient building extraction algorithms can identify and segment building areas to provide informative data for downstream tasks. Currently, building extraction is mainly achieved by deep convolutional neural networks (CNNs) based on the U-shaped encoder-decoder architecture. However, the local perceptive field of the convolutional operation poses a challenge for CNNs to fully capture the semantic information of large buildings, especially in high-resolution remote sensing images. Considering the recent success of the Transformer in computer vision tasks, in this paper, first we propose a shifted-window (swin) Transformer-based encoding booster. The proposed encoding booster includes a swin Transformer pyramid containing patch merging layers for down-sampling, which enables our encoding booster to extract semantics from multi-level features at different scales. Most importantly, the receptive field is significantly expanded by the global self-attention mechanism of the swin Transformer, allowing the encoding booster to capture the large-scale semantic information effectively and transcend the limitations of CNNs. Furthermore, we integrate the encoding booster in a specially designed U-shaped network through a novel manner, named the Swin Transformer-based Encoding Booster- U-shaped Network (STEB-UNet), to achieve the feature-level fusion of local and large-scale semantics. Remarkably, compared with other Transformer-included networks, the computational complexity and memory requirement of the STEB-UNet are significantly reduced due to the swin design, making the network training much easier. Experimental results show that the STEB-UNet can effectively discriminate and extract buildings of different scales and demonstrate higher accuracy than the state-of-the-art networks on public datasets.

A Swin Transformer-Based Encoding Booster Integrated in U-Shaped Network for Building Extraction

期刊

REMOTE SENSING

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Swin Transformer-Based Encoding Booster Integrated in U-Shaped Network for Building Extraction

期刊

REMOTE SENSING

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文