期刊
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
卷 31, 期 5, 页码 1995-2007出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSVT.2020.3014491
关键词
Streaming media; Dynamics; Visualization; Analytical models; Task analysis; Image retrieval; Training data; Fine-grained video retrieval; sketch-based video retrieval; sketch dataset; cross-modal matching; triplet ranking; meta-learning inspired techniques
资金
- National Natural Science Foundation of China (NSFC) [61922015, 61773071]
- Beijing University of Posts and Telecommunications (BUPT)
This research introduces a novel fine-grained instance-level sketch-based video retrieval problem and dataset, and proposes a multi-stream multi-modality deep network with a relation module to improve the matching of visual appearance and motion at a fine-grained level. The results show that this model outperforms existing state-of-the-art models designed for video analysis.
Existing sketch-analysis work studies sketches depicting static objects or scenes. In this work, we propose a novel cross-modal retrieval problem of fine-grained instance-level sketch-based video retrieval (FG-SBVR), where a sketch sequence is used as a query to retrieve a specific target video instance. Compared with sketch-based still image retrieval, and coarse-grained category-level video retrieval, this is more challenging as both visual appearance and motion need to be simultaneously matched at a fine-grained level. We contribute the first FG-SBVR dataset with rich annotations. We then introduce a novel multi-stream multi-modality deep network to perform FG-SBVR under both strong and weakly supervised settings. The key component of the network is a relation module, designed to prevent model overfitting given scarce training data. We show that this model significantly outperforms a number of existing state-of-the-art models designed for video analysis.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据