4.7 Article

Multi-Stage Feature Fusion Network for Video Super-Resolution

期刊

IEEE TRANSACTIONS ON IMAGE PROCESSING
卷 30, 期 -, 页码 2923-2934

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIP.2021.3056868

关键词

Visualization; Convolution; Superresolution; Task analysis; Fuses; Feature extraction; Modulation; Video super-resolution; single image super-resolution; deep learning; deformable convolution; feature fusion

资金

  1. National Natural Science Foundation of China [61872189, 61825601]
  2. Natural Science Foundation of Jiangsu Province [BK20191397]
  3. Postgraduate Research and Practice Innovation Program of Jiangsu Province [KYCX20_0968]

向作者/读者索取更多资源

The paper introduces a multi-stage feature fusion network for video super-resolution task, which fuses temporally aligned features of supporting frames and spatial features of reference frame at different stages to enhance features from low-resolution to high-resolution. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on VSR task.
Video super-resolution (VSR) is to restore a photo-realistic high-resolution (HR) frame from both its corresponding low-resolution (LR) frame (reference frame) and multiple neighboring frames (supporting frames). An important step in VSR is to fuse the feature of the reference frame with the features of the supporting frames. The major issue with existing VSR methods is that the fusion is conducted in a one-stage manner, and the fused feature may deviate greatly from the visual information in the original LR reference frame. In this paper, we propose an end-to-end Multi-Stage Feature Fusion Network that fuses the temporally aligned features of the supporting frames and the spatial feature of the original reference frame at different stages of a feed-forward neural network architecture. In our network, the Temporal Alignment Branch is designed as an inter-frame temporal alignment module used to mitigate the misalignment between the supporting frames and the reference frame. Specifically, we apply the multi-scale dilated deformable convolution as the basic operation to generate temporally aligned features of the supporting frames. Afterwards, the Modulative Feature Fusion Branch, the other branch of our network accepts the temporally aligned feature map as a conditional input and modulates the feature of the reference frame at different stages of the branch backbone. This enables the feature of the reference frame to be referenced at each stage of the feature fusion process, leading to an enhanced feature from LR to HR. Experimental results on several benchmark datasets demonstrate that our proposed method can achieve state-of-the-art performance on VSR task.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据