4.7 Article

SITS-Former: A pre-trained spatio-spectral-temporal representation model for Sentinel-2 time series classification

出版社

ELSEVIER
DOI: 10.1016/j.jag.2021.102651

关键词

Pre-training; satellite image time series (SITS); Self-supervised learning; Sentinel-2; Transformer

资金

  1. National Natural Science Foun-dation of China [41901356, 61906096, 41701512]
  2. Natural Science Foundation of Jiangsu Province [BK20180786]

向作者/读者索取更多资源

This article introduces a pre-trained representation model called SITS-Former for Sentinel-2 time series classification. The model is pre-trained using self-supervised learning on a large amount of unlabeled data and then fine-tuned for a target classification task. Experimental results on two crop classification tasks show that SITS-Former outperforms state-of-the-art approaches and greatly reduces the burden of manual labeling.
Sentinel-2 images provide a rich source of information for a variety of land cover, vegetation, and environmental monitoring applications due to their high spectral, spatial, and temporal resolutions. Recently, deep learning-based classification of Sentinel-2 time series becomes a popular solution to vegetation classification and land cover mapping, but it often demands a large number of manually annotated labels. Improving classification performance with limited labeled data is still a challenge in many real-world remote sensing applications. To address label scarcity, we present SITS-Former (SITS stands for Satellite Image Time Series and Former stands for Transformer), a pre-trained representation model for Sentinel-2 time series classification. SITS-Former adopts a Transformer encoder as the backbone and takes time series of image patches as input to learn spatio-spectral-temporal features. According to the principles of self-supervised learning, we pre-train SITS-Former on massive unlabeled Sentinel-2 time series via a missing-data imputation proxy task. Given an incomplete time series with some patches being masked randomly, the network is asked to regress the central pixels of these masked patches based on the residual ones. By doing so, the network can capture high-level spatial and temporal dependencies from the data to learn discriminative features. After pre-training, the network can adapt the learned features to a target classification task through fine-tuning. As far as we know, this is the first study that exploits self-supervised learning for patch-based representation learning and classification of SITS. We quantitatively evaluate the quality of the learned features by transferring them on two crop classification tasks, showing that SITS-Former outperforms state-of-the-art approaches and yields a significant improvement (2.64%similar to 3.30% in overall accuracy) over the purely supervised model. The proposed model provides an effective tool for SITS-related applications as it greatly reduces the burden of manual labeling. The source code will be released at htt ps://github.com/linlei1214/SITS-Former upon publication.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据