4.7 Article

SwinWave-SR: Multi-scale lightweight underwater image super-resolution

Journal

INFORMATION FUSION
Volume 103, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.inffus.2023.102127

Keywords

Vision transformer (ViT); Wavelet transform; Super-resolution; Underwater Image Enhancement (UIE); Lightweight model

Ask authors/readers for more resources

The resource-limited nature of underwater vision equipment affects underwater robotics and ocean engineering tasks. Super-resolution methods, particularly using Vision Transformers (ViTs), have emerged to enhance low-resolution underwater images. However, ViTs face challenges in handling severe degradation in underwater imaging. In contrast, Multi-scale ViTs (MViTs) overcome these challenges by preserving long-range dependencies through evolving channel capacity. This study proposes a novel algorithm, SwinWave-SR, for efficient and accurate multi-scale super-resolution for underwater images.
The resource-limited nature of underwater vision equipment leads to poor, otherwise low-resolution information affecting the downstream underwater robotics and ocean engineering tasks. Underwater Image Enhancement (UIE) methods have emerged, particularly Super-Resolution (SR), to tackle the aforementioned challenge by restoring the corresponding low-resolution image to a high-quality counterpart. Vision Transformers (ViTs) have recently been employed for SR tasks thanks to their superior performance over mainstream convolution neural networks. The success of ViTs is largely due to their self-attention mechanism; however, they may encounter challenges in dealing with severe and unpredictable degradation in underwater imaging. In contrast, Multi-scale ViTs (MViTs) variants such as the Swin transformers have overcome that challenge by preserving long-range dependencies over multi-scale feature hierarchies through evolving channel capacity. MViTs tend to induce spatial efficiency through classical down-sampling, such as average pooling over key/values, which results in an inevitable loss of high-frequency components. To address this lack, in the current work, we propose a novel algorithm, SwinWave-SR, for efficient and accurate multi-scale SR for underwater images. Our proposed algorithm is based on Swin transformer consisting of a wavelet block to restrict the information drop by downsampling in an invertible fashion. Consequently, the key components are preserved to assist self-attention learning while reducing its computational cost simultaneously. To further complement it, we explore a prominent compression regime, namely the Lottery Ticket Hypothesis (LTH), to discover a lightweight sub-network with competitive performance to its original model by reducing computational costs up to 70.44%. Overall, SwinWave-SR improves peak signal-to-noise ratio (PSNR) by 0.95 dB similar to 2.23 dB compared to the state-of-the-art SwinIR while reducing the number of parameters by 29.56% and the calculation cost by 18.734%. Experimental results show that the proposed SwinWave-SR method outperforms the state-of-the-art SR methods on four benchmark underwater datasets and significantly improves PSNR and structural similarity index (SSIM).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available