4.7 Article

TRS: Transformers for Remote Sensing Scene Classification

期刊

REMOTE SENSING
卷 13, 期 20, 页码 -

出版社

MDPI
DOI: 10.3390/rs13204143

关键词

transformers; deep convolutional neural networks; multi-head self-attention; remote sensing scene classification

资金

  1. Provincial Science and Technology Innovation Special Fund Project of Jilin Province [20190302026GX]
  2. Natural Science Foundation of Jilin Province [20200201037JC]

向作者/读者索取更多资源

The paper proposes a new method called Remote Sensing Transformer (TRS), which combines convolutional neural networks and Transformers, uses self-attention and multiple Transformer encoders to improve the performance of remote sensing scene classification, and uses a linear classifier for classification. Experimental results show that TRS outperforms existing methods and achieves higher accuracy.
Remote sensing scene classification remains challenging due to the complexity and variety of scenes. With the development of attention-based methods, Convolutional Neural Networks (CNNs) have achieved competitive performance in remote sensing scene classification tasks. As an important method of the attention-based model, the Transformer has achieved great success in the field of natural language processing. Recently, the Transformer has been used for computer vision tasks. However, most existing methods divide the original image into multiple patches and encode the patches as the input of the Transformer, which limits the model's ability to learn the overall features of the image. In this paper, we propose a new remote sensing scene classification method, Remote Sensing Transformer (TRS), a powerful pure CNNs -> Convolution + Transformer -> pure Transformers structure. First, we integrate self-attention into ResNet in a novel way, using our proposed Multi-Head Self-Attention layer instead of 3 x 3 spatial revolutions in the bottleneck. Then we connect multiple pure Transformer encoders to further improve the representation learning performance completely depending on attention. Finally, we use a linear classifier for classification. We train our model on four public remote sensing scene datasets: UC-Merced, AID, NWPU-RESISC45, and OPTIMAL-31. The experimental results show that TRS exceeds the state-of-the-art methods and achieves higher accuracy.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据