☆ 4.7 Article

Boosting Temporal Binary Coding for Large-Scale Video Search

IEEE TRANSACTIONS ON MULTIMEDIA (2021)

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

卷 23, 期 -, 页码 353-364

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TMM.2020.2978593

关键词

Visualization; Binary codes; Boosting; Encoding; Semantics; Indexing; Feature extraction; Large-scale video search; binary code learning; locality-sensitive hashing; temporal consistency; multi-table indexing

类别

Computer Science, Information Systems Computer Science, Software Engineering Telecommunications

资金

National Natural Science Foundation of China [61872021, 61690202]
Beijing Nova Program of Science and Technology [Z191100001119050]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this paper, the multi-table learning problem for video search is studied, aiming to learn binary codes by capturing intrinsic video similarities from both visual and temporal aspects to build multiple hash table indices. Under the boosting learning framework, the binary codes, hash functions and temporal variation of each table are efficiently and jointly optimized.

In recent years, there has been an explosive increase in the amount of existing visual data. Hashing techniques have been successfully applied to deal with the large-scale nearest neighbor search problem among data on this massive scale. However, existing hashing methods usually learn a single hash code for each data point, and only by taking the content correlations among them into account. In practice, however, when handling complex visual data such as video, strong temporal relations exist among the successive frames. Moreover, if the preferred performance for large-scale video search is to be delivered, multiple hash codes are required for each data point in order to build multiple hash table indices. To address these problems, in this paper, we first study the multi-table learning problem for video search and attempt to learn binary codes by capturing the intrinsic video similarities from both the visual and the temporal aspects. By regarding the search over multiple tables as an ensemble prediction, the whole multi-table learning problem can be solved in a boosting learning manner to complementarily cover the nearest neighbors. For each table, a temporal binary coding solution is devised that thinks over the intrinsic relations among the visual content and the temporal consistency among the successive frames simultaneously. More specifically, we approximate the intrinsic visual similarities using a low-rank matrix based on sparse, non-negative feature expression. Furthermore, to essentially preserve the temporal consistency, we introduce a subspace rotation to model the variation among the successive frames. Under the boosting learning framework, the binary codes, hash functions and temporal variation of each table can be efficiently and jointly optimized. Extensive experiments on three large video datasets demonstrate that the proposed approach significantly outperforms a number of state-of-the-art hashing methods.

Boosting Temporal Binary Coding for Large-Scale Video Search

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Boosting Temporal Binary Coding for Large-Scale Video Search

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文