☆ 4.7 Article

End-to-End Rate-Distortion Optimized Learned Hierarchical Bi-Directional Video Compression

IEEE TRANSACTIONS ON IMAGE PROCESSING (2022)

期刊

IEEE TRANSACTIONS ON IMAGE PROCESSING

卷 31, 期 -, 页码 974-983

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TIP.2021.3138300

关键词

Bidirectional control; Image coding; Video compression; Motion compensation; Optimization; Entropy; Video codecs; Learned video compression; learned bi-directional motion compensation; flow field sub-sampling; flow vector prediction; end-to-end optimization

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic

资金

TUBITAK 1001 Project [217E033]
TUBITAK 2247-A Award [120C156]
Turkish Academy of Sciences (TUBA)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a learned hierarchical bi-directional video codec (LHBDC) that combines the benefits of hierarchical motion-compensated prediction and end-to-end optimization. Experimental results show that the LHBDC achieves the best rate-distortion (R-D) results among existing learned VC schemes. Ablation studies demonstrate the performance gains due to proposed novel tools.

Conventional video compression (VC) methods are based on motion compensated transform coding, and the steps of motion estimation, mode and quantization parameter selection, and entropy coding are optimized individually due to the combinatorial nature of the end-to-end optimization problem. Learned VC allows end-to-end rate-distortion (R-D) optimized training of nonlinear transform, motion and entropy model simultaneously. Most works on learned VC consider end-to-end optimization of a sequential video codec based on R-D loss averaged over pairs of successive frames. It is well-known in conventional VC that hierarchical, bi-directional coding outperforms sequential compression because of its ability to use both past and future reference frames. This paper proposes a learned hierarchical bi-directional video codec (LHBDC) that combines the benefits of hierarchical motion-compensated prediction and end-to-end optimization. Experimental results show that we achieve the best R-D results that are reported for learned VC schemes to date in both PSNR and MS-SSIM. Compared to conventional video codecs, the R-D performance of our end-to-end optimized codec outperforms those of both x265 and SVT-HEVC encoders (veryslow preset) in PSNR and MS-SSIM as well as HM 16.23 reference software in MS-SSIM. We present ablation studies showing performance gains due to proposed novel tools such as learned masking, flow-field subsampling, and temporal flow vector prediction. The models and instructions to reproduce our results can be found in https://github.com/makinyilmaz/LHBDC/.

End-to-End Rate-Distortion Optimized Learned Hierarchical Bi-Directional Video Compression

期刊

IEEE TRANSACTIONS ON IMAGE PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

End-to-End Rate-Distortion Optimized Learned Hierarchical Bi-Directional Video Compression

期刊

IEEE TRANSACTIONS ON IMAGE PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文