☆ 4.7 Article

Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING (2022)

期刊

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

卷 60, 期 -, 页码 -

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TGRS.2022.3144894

关键词

Transformers; Semantics; Image segmentation; Feature extraction; Remote sensing; Decoding; Convolutional neural networks; Boundary detection; semantic segmentation; squeeze-and-excitation (SE) block; Swin transformer; very high resolution (VHR) remote sensing imagery

类别

Geochemistry & Geophysics Engineering, Electrical & Electronic Remote Sensing Imaging Science & Photographic Technology

资金

National Key Research and Development Program of China [2018YFB0504800, 2018YFB0504801, 06-Y30F04-9001-20/22]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article presents a hybrid deep neural network that combines transformer and convolutional neural network (CNN) for semantic segmentation of very high resolution remote sensing imagery. The network utilizes a new universal backbone Swin transformer for feature extraction and incorporates various strategies for multiscale context modeling. It achieves improved accuracy through skip connections and an auxiliary boundary detection branch.

This article presents a transformer and convolutional neural network (CNN) hybrid deep neural network for semantic segmentation of very high resolution (VHR) remote sensing imagery. The model follows an encoder-decoder structure. The encoder module uses a new universal backbone Swin transformer to extract features to achieve better long-range spatial dependencies modeling. The decoder module draws on some effective blocks and successful strategies of CNN-based models in remote sensing image segmentation. In the middle of the framework, an atrous spatial pyramid pooling block based on depthwise separable convolution (SASPP) is applied to obtain a multiscale context. A U-shaped decoder is used to gradually restore the size of the feature maps. Three skip connections are built between the encoder and decoder feature maps of the same size to maintain the transmission of local details and enhance the communication of multiscale features. A squeeze-and-excitation (SE) channel attention block is added before segmentation for feature augmentation. An auxiliary boundary detection branch is combined to provide edge constraints for semantic segmentation. Extensive ablation experiments were conducted on the International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen and Potsdam benchmarks to test the effectiveness of multiple components of the network. At the same time, the proposed method is compared with the current state-of-the-art methods on the two benchmarks. The proposed hybrid network achieved the second highest overall accuracy (OA) on both the Potsdam and Vaihingen benchmarks (code and models are available at https://github.com/zq7734509/mmsegmentation- multilayer).

Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

期刊

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

期刊

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文