☆ 4.7 Article

Self-Supervised Pretraining via Multimodality Images With Transformer for Change Detection

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING (2023)

Journal

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Volume 61, Issue -, Pages -

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TGRS.2023.3271024

Keywords

Task analysis; Feature extraction; Remote sensing; Data models; Training; Transformers; Self-supervised learning; Change detection (CD); self-supervised learning (SSL); temporal fusion

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This article proposes an RGB-elevation contrastive and image mask prediction pretraining framework, which is evaluated by transferring the pretrained model into the change detection task. The framework combines masked image modeling, instance discriminant, and a temporal fusion module to achieve state-of-the-art results in change detection datasets. The method outperforms supervised learning methods and two mainstream SSL methods on the CD task.

Self-supervised learning (SSL) has shown remarkable success in image representation learning. Among these methods, masked image modeling and contrastive learning are the most recent and dominant methods. However, these two approaches will behave differently after being transferred into various downstream tasks. In this article, we propose a red, green, and blue (RGB)-elevation contrastive and image mask prediction pretraining framework. The elevation is normalized digital surface model. Then, we evaluate the learned representation by transferring the pretrained model into change detection (CD) task. To this end, we leverage the recently proposed vision transformer's capability of attending to objects and combine it with the pretext task which consists of masked image modeling and instance discriminant for fine-tuning the spatial tokens. In addition, the CD task also requires us to do information interaction between the two temporal remote sensing images. To counter this problem, we propose a plug-in temporal fusion module based on masked cross attention, and then, we evaluate its effectiveness in three open CD datasets in terms of initializing the supervised training weights. Our method achieves improvements in comparison to supervised learning methods and two mainstream SSL methods momentum contrast (MoCo) and DINO on CD task. The results of our experiment also achieve the state-of-the-art in four CD datasets. The code will be available at URL.

Self-Supervised Pretraining via Multimodality Images With Transformer for Change Detection

Journal

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Self-Supervised Pretraining via Multimodality Images With Transformer for Change Detection

Journal

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper