期刊
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
卷 33, 期 7, 页码 3502-3515出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSVT.2022.3233221
关键词
Video coding; learned video compression; video forecasting; inter prediction; motion representation; motion compensation
In this paper, an innovative motion modeling approach is proposed by decomposing it into two components: intrinsic motion and compensatory motion. The intrinsic motion captures the implicit spatiotemporal context in the historical sequence, while the compensatory motion acts as side information for structural refinement and texture enhancement. By decomposing motion, this method addresses the questions of motion representation, compensation, and coding in the learned video compression framework.
Inter prediction is the critical component in hybrid coding framework to deal with the temporal redundancy. Most of the neural video coding methods typically follow the motion compensation based inter coding scheme, establishing motion vector (MV) as the central role. In this paper, we innovatively propose an efficient motion modeling approach by inherently decomposing it into two components, the intrinsic motion and the compensatory motion. The intrinsic motion originates from the implicit spatiotemporal context hidden in the historical sequence, which can be intuitively captured free of bits. On the top of it, the compensatory motion acts a role of structural refinement and texture enhancement as a form of side information. In particular, the inter prediction is performed in the feature space as a manner of progressive temporal transition, conditioned on the decomposed motion. By the motion decomposition paradigm, we innovatively answer the question of motion representation, compensation and coding in the learned video compression framework. With the temporal prediction, the remaining pixel residue is signaled to obtain the reconstruction. Extensive experimental results demonstrate that the proposed method achieves state-of-the-art coding performance on par with other end-to-end coding methods, and outperforms versatile video coding (VVC) under low-delay P (LDP) configuration in terms of MS-SSIM metric.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据