期刊
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING
卷 176, 期 -, 页码 139-150出版社
ELSEVIER
DOI: 10.1016/j.isprsjprs.2021.04.004
关键词
Video object detection; Plug & Play; Convolutional regression tracker; Deep learning; Tracking
Video object detection is less researched compared to object detection in images due to shortage of labelled video datasets. Frames in a video clip are highly correlated, requiring more video labels for good data variation. Propose to improve the performance of an image object detector by augmenting it with a class-agnostic convolutional regression tracker.
Video object detection is a fundamental research task for scene understanding. Compared with object detection in images, object detection in videos has been less researched due to shortage of labelled video datasets. As frames in a video clip are highly correlated, a larger quantity of video labels are needed to have good data variation, which are not always available as the labels are much more expensive to attain. Regarding the above-mentioned problem, it is easy to train an image object detector, but not always possible to train a video object detector if there are insufficient video labels for certain classes. In order to deal with this problem and improve the performance of an image object detector for the classes without video labels, we propose to augment a well-trained image object detector with an efficient and effective class-agnostic convolutional regression tracker for the video object detection task. The tracker learns to track objects by reusing the features from the image object detector, which is a light-weighted increment to the detector, with only a slight speed drop for the video object detection task. The performance of our model is evaluated on the large-scale ImageNet VID dataset. Our strategy improves the mean average precision (mAP) score for the image object detector by around 5% and around 3% for the image object detector plus Seq-NMS post-processing.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据