期刊
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
卷 45, 期 6, 页码 7764-7780出版社
IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2022.3224051
关键词
Cameras; Streaming media; Brightness; Interpolation; Task analysis; Lenses; Visualization; Video frame interpolation; event-enhanced; high-speed scenarios; high-speed VFI dataset
This paper proposes a Fast-Slow joint synthesis framework, named SuperFast, for event-enhanced high-speed video frame interpolation. It divides the task into two sub-tasks, one for high-speed motion contents and the other for relatively slow-motion contents, and utilizes a fusion module to generate the final video frame interpolation results. Experimental results show that the proposed framework achieves state-of-the-art 200x video frame interpolation performance under high-speed motion scenarios.
Traditional frame-based video frame interpolation (VFI) methods rely on the linear motion assumption and brightness invariance assumption, which may lead to fatal errors confronting the scenarios with high-speed motions. To tackle the above challenge, inspired by the advantages of event cameras on asynchronously recording brightness changes at each pixel, we propose a Fast-Slow joint synthesis framework for event-enhanced high-speed video frame interpolation, named SuperFast, in this paper, which can generate high frame rate (5000 FPS, 200x faster) video from the input low frame rate (25 FPS) video and the corresponding event stream. In our framework, the task is divided into two sub-tasks, i.e., video frame interpolation for the contents with and without highspeed motions, which are tackled by two corresponding branches, i.e., the fast synthesis pathway and the slow synthesis pathway. The fast synthesis pathway leverages a spiking neural network to encode the input event stream, and combines boundary frames to generate intermediate results through synthesis and refinement, targeting on contents with high-speed motions. The slow synthesis pathway stacks the two input boundary frames and the event stream to synthesize intermediate results, focusing on relatively slowmotion contents. Finally, a fusion module with a comparison loss is utilized to generate the final video frame interpolation results. We also build a hybrid visual acquisition system containing an event camera and a high frame rate camera, and collect the first 5000 FPS High-Speed Event-enhanced Video frame Interpolation (THUHSEVI) dataset. To evaluate the performance of our proposed framework, we have conducted experiments on our THUHSEVI dataset and the existing HS-ERGB dataset. Experimental results demonstrate that our proposed framework can achieve state-of-the-art 200x video frame interpolation performance under high-speed motion scenarios.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据