☆ 4.7 Article

Rethinking Transformers for Semantic Segmentation of Remote Sensing Images

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING (2023)

期刊

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

卷 61, 期 -, 页码 -

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TGRS.2023.3302024

关键词

Encoder-decoder structure; global-local transformer; remote sensing (RS); semantic segmentation

类别

Geochemistry & Geophysics Engineering, Electrical & Electronic Remote Sensing Imaging Science & Photographic Technology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this article, a global-local transformer segmentor (GLOTS) framework is proposed for the semantic segmentation of remote sensing (RS) images, which adopts transformers for both encoding and decoding to acquire consistent feature representations and fully exploit global and local features. Experimental results demonstrate that the proposed GLOTS achieves better performance on three benchmark RS datasets.

Transformer has been widely applied in image processing tasks as a substitute for convolutional neural networks (CNNs) for feature extraction due to its superiority in global context modeling and flexibility in model generalization. However, the existing transformer-based methods for semantic segmentation of remote sensing (RS) images are still with several limitations, which can be summarized into two main aspects: 1) the transformer encoder is generally combined with CNN-based decoder, leading to inconsistency in feature representations; and 2) the strategies for global and local context information utilization are not sufficiently effective. Therefore, in this article, a global-local transformer segmentor (GLOTS) framework is proposed for the semantic segmentation of RS images to acquire consistent feature representations by adopting transformers for both encoding and decoding, in which a masked image modeling (MIM) pretrained transformer encoder is adopted to learn semantic-rich representations of input images and a multiscale global-local transformer decoder is designed to fully exploit the global and local features. Specifically, the transformer decoder uses a feature separation-aggregation module (FSAM) to utilize the feature adequately at different scales and adopts a global-local attention module (GLAM) containing global attention block (GAB) and local attention block (LAB) to capture the global and local context information, respectively. Furthermore, a learnable progressive upsampling strategy (LPUS) is proposed to restore the resolution progressively, which can flexibly recover the fine-grained details in the upsampling process. The experiment results on the three benchmark RS datasets demonstrate that the proposed GLOTS is capable of achieving better performance with some state-of-the-art methods, and the superiority of the proposed framework is also verified by ablation studies. The code will be available at https://github.com/lyhnsn/GLOTS.

Rethinking Transformers for Semantic Segmentation of Remote Sensing Images

期刊

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Rethinking Transformers for Semantic Segmentation of Remote Sensing Images

期刊

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文