期刊
IEEE TRANSACTIONS ON IMAGE PROCESSING
卷 30, 期 -, 页码 2923-2934出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIP.2021.3056868
关键词
Visualization; Convolution; Superresolution; Task analysis; Fuses; Feature extraction; Modulation; Video super-resolution; single image super-resolution; deep learning; deformable convolution; feature fusion
资金
- National Natural Science Foundation of China [61872189, 61825601]
- Natural Science Foundation of Jiangsu Province [BK20191397]
- Postgraduate Research and Practice Innovation Program of Jiangsu Province [KYCX20_0968]
The paper introduces a multi-stage feature fusion network for video super-resolution task, which fuses temporally aligned features of supporting frames and spatial features of reference frame at different stages to enhance features from low-resolution to high-resolution. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on VSR task.
Video super-resolution (VSR) is to restore a photo-realistic high-resolution (HR) frame from both its corresponding low-resolution (LR) frame (reference frame) and multiple neighboring frames (supporting frames). An important step in VSR is to fuse the feature of the reference frame with the features of the supporting frames. The major issue with existing VSR methods is that the fusion is conducted in a one-stage manner, and the fused feature may deviate greatly from the visual information in the original LR reference frame. In this paper, we propose an end-to-end Multi-Stage Feature Fusion Network that fuses the temporally aligned features of the supporting frames and the spatial feature of the original reference frame at different stages of a feed-forward neural network architecture. In our network, the Temporal Alignment Branch is designed as an inter-frame temporal alignment module used to mitigate the misalignment between the supporting frames and the reference frame. Specifically, we apply the multi-scale dilated deformable convolution as the basic operation to generate temporally aligned features of the supporting frames. Afterwards, the Modulative Feature Fusion Branch, the other branch of our network accepts the temporally aligned feature map as a conditional input and modulates the feature of the reference frame at different stages of the branch backbone. This enables the feature of the reference frame to be referenced at each stage of the feature fusion process, leading to an enhanced feature from LR to HR. Experimental results on several benchmark datasets demonstrate that our proposed method can achieve state-of-the-art performance on VSR task.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据