4.7 Article

Multi-Attention Multi-Image Super-Resolution Transformer (MAST) for Remote Sensing

Journal

REMOTE SENSING
Volume 15, Issue 17, Pages -

Publisher

MDPI
DOI: 10.3390/rs15174183

Keywords

deep learning; remote sensing; multi-attention; Transformer; multi-scale; multi-image super-resolution

Ask authors/readers for more resources

This study proposes a Multi-Attention Multi-Image Super-Resolution Transformer (MAST) that improves the application of multi-scale information and the modeling of the attention mechanism in multi-image super-resolution tasks. The proposed model achieves a superior restoration effect by better extracting and utilizing non-redundant information, balancing the global structure and local details of the image. Comparative experiments show notable enhancements in cPSNR, with improvements of 0.91 dB and 0.81 dB observed in the NIR and RED bands of the PROBA-V dataset, respectively, compared to existing state-of-the-art methods.
Deep-learning-driven multi-image super-resolution (MISR) reconstruction techniques have significant application value in the field of aerospace remote sensing. In particular, Transformer-based models have shown outstanding performance in super-resolution tasks. However, current MISR models have some deficiencies in the application of multi-scale information and the modeling of the attention mechanism, leading to an insufficient utilization of complementary information in multiple images. In this context, we innovatively propose a Multi-Attention Multi-Image Super-Resolution Transformer (MAST), which involves improvements in two main aspects. Firstly, we present a Multi-Scale and Mixed Attention Block (MMAB). With its multi-scale structure, the network is able to extract image features from different scales to obtain more contextual information. Additionally, the introduction of mixed attention allows the network to fully explore high-frequency features of the images in both channel and spatial dimensions. Secondly, we propose a Collaborative Attention Fusion Block (CAFB). By incorporating channel attention into the self-attention layer of the Transformer, we aim to better establish global correlations between multiple images. To improve the network's perception ability of local detailed features, we introduce a Residual Local Attention Block (RLAB). With the aforementioned improvements, our model can better extract and utilize non-redundant information, achieving a superior restoration effect that balances the global structure and local details of the image. The results from the comparative experiments reveal that our approach demonstrated a notable enhancement in cPSNR, with improvements of 0.91 dB and 0.81 dB observed in the NIR and RED bands of the PROBA-V dataset, respectively, in comparison to the existing state-of-the-art methods. Extensive experiments demonstrate that the method proposed in this paper can provide a valuable reference for solving multi-image super-resolution tasks for remote sensing.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available