4.7 Article

CloudViT: A Lightweight Vision Transformer Network for Remote Sensing Cloud Detection

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/LGRS.2022.3233122

Keywords

Cloud computing; Remote sensing; Feature extraction; Earth; Artificial satellites; Satellites; Transformers; Attention mechanism; cloud detection; deep learning; remote sensing image; vision transformer (ViT)

Ask authors/readers for more resources

In this study, a lightweight vision transformer network called CloudViT is proposed for cloud detection from satellite imagery. By using dark channel priors in multispectral imagery to guide the network to learn features, the network is able to enhance image features and produce more accurate cloud detection results. Additionally, a plug-and-play channel adaptive module is introduced to address the inconsistency in the number of bands from different satellite sensors.
Clouds inevitably exist in satellite images, which limit the processing and application of satellite images to a certain extent. Therefore, cloud detection is a preprocessing task in satellite image extraction and analysis processing. However, the existing methods are difficult to mine robust features, and the number of parameters and computation are large, which is not conducive to the deployment of the model. In this letter, cloud vision transformer (CloudViT), a lightweight vision transformer network for cloud detection from satellite imagery, is proposed. In detail, to utilize dark channel priors in multispectral imagery to guide the network to learn features, a multiscale dark channel extractor is used to first predict dark channels, and then, the dark channel features and image features are input to the attention mechanism-based dark channel-guided context aggregation module to enhance image features, which in turn makes cloud detection results more accurate. At the same time, to enhance the transfer ability of the network between different satellite sensors, a plug-and-play channel adaptive module is proposed to deal with the inconsistency of the number of different satellite sensor bands. The experimental results on the Landsat7 dataset show that our network CloudViT outperforms the state-of-the-art methods while keeping the number of parameters and computation small. At the same time, the experimental results on transfer to three other datasets show that using the channel adaptation module can greatly improve the transfer ability of the model.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available