Journal
REMOTE SENSING
Volume 14, Issue 18, Pages -Publisher
MDPI
DOI: 10.3390/rs14184656
Keywords
land cover classification; polarimetric SAR; deep learning; vision transformer
Categories
Funding
- Major Project of Chinese High-resolution Earth Observation System [30-H30C01-9004-19/21]
- National Natural Science Foundation of China [62171023, 62222102]
Ask authors/readers for more resources
This paper proposes a classification method based on vision Transformer, which extracts features from the global range of images using self-attention block and pre-trains the model using Mask Autoencoder. Experimental results demonstrate the superiority of this method in PolSAR image classification.
Deep learning methods have been widely studied for Polarimetric synthetic aperture radar (PolSAR) land cover classification. The scarcity of PolSAR labeled samples and the small receptive field of the model limit the performance of deep learning methods for land cover classification. In this paper, a vision Transformer (ViT)-based classification method is proposed. The ViT structure can extract features from the global range of images based on a self-attention block. The powerful feature representation capability of the model is equivalent to a flexible receptive field, which is suitable for PolSAR image classification at different resolutions. In addition, because of the lack of labeled data, the Mask Autoencoder method is used to pre-train the proposed model with unlabeled data. Experiments are carried out on the Flevoland dataset acquired by NASA/JPL AIRSAR and the Hainan dataset acquired by the Aerial Remote Sensing System of the Chinese Academy of Sciences. The experimental results on both datasets demonstrate the superiority of the proposed method.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available