3.8 Proceedings Paper

NCOD: Near-Optimum Video Compression for Object Detection

出版社

IEEE
DOI: 10.1109/ISCAS46773.2023.10182205

关键词

Video coding; Video coding for machine (VCM); CRF; Object Detection; JND

向作者/读者索取更多资源

With the rise of technologies like smart cities, Internet of things (IoT), and 5G, there has been a significant increase in visual data at the edges and remote nodes. Traditional video compression solutions optimized for human vision are not efficient for machine vision tasks, so this paper presents a methodology to optimize the existing video compression standard, HEVC, for object detection tasks. By collecting a dataset of compressed videos with different compression ratios and corresponding object detection performance, a trade-off point between bitrate and object detection performance is defined. The resulting model can predict this trade-off point accurately, resulting in significant bitrate reduction compared to high-quality video for object detection.
With the emergence of technologies like smart cities, Internet of things (IoT), and 5G, the amount of produced visual data at the edges and remote nodes has exploded. Since for a considerable portion of the captured video the target is a machine learning task, rather than a human audience, transmission of videos in such applications requires efficient video compression tailored for machine vision. However, existing compression solutions are optimized for human vision. This paper presents a methodology to optimize an existing video compression standard, HEVC, for a machine vision task, Object Detection (OD). To this end, (1) a dataset of compressed videos, including several compression-ratios and their corresponding OD performance is collected to enable modeling, (2) A trade-off point (knee-point) between bitrate and OD performance is defined, that finds the point after which no major improvements will be achieved, (3) a set of features were extracted and studied to model this point, via a practical machine learning method. The resulting solution can predict the knee-point with MAE=1.28, resulting in a.Recall of only 0.012 and bitrate reduction of 86.56%, compared to OD with very high-quality video.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据