☆ 4.7 Article

Video Crowd Localization With Multifocus Gaussian Neighborhood Attention and a Large-Scale Benchmark

IEEE TRANSACTIONS ON IMAGE PROCESSING (2022)

期刊

IEEE TRANSACTIONS ON IMAGE PROCESSING

卷 31, 期 -, 页码 6032-6047

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TIP.2022.3205210

关键词

Head; Location awareness; Task analysis; Feature extraction; Annotations; Convolutional neural networks; Context modeling; Video crowd analysis; Gaussian neighborhood attention; VSCrowd dataset; spatial-temporal modeling

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The paper proposes a new method for video crowd localization, which models the spatial-temporal dependencies of human mobility using multi-focus Gaussian neighborhood attention. They develop a unified neural network to accurately locate head centers in video clips. They also introduce a large-scale crowd video benchmark for future research in this field.

Video crowd localization is a crucial yet challenging task, which aims to estimate exact locations of human heads in the given crowded videos. To model spatial-temporal dependencies of human mobility, we propose a multi-focus Gaussian neighborhood attention (GNA), which can effectively exploit long-range correspondences while maintaining the spatial topological structure of the input videos. In particular, our GNA can also capture the scale variation of human heads well using the equipped multi-focus mechanism. Based on the multi-focus GNA, we develop a unified neural network called GNANet to accurately locate head centers in video clips by fully aggregating spatial-temporal information via a scene modeling module and a context cross-attention module. Moreover, to facilitate future researches in this field, we introduce a large-scale crowd video benchmark named VSCrowd (https://github.com/HopLee6/VSCrowd), which consists of 60K+ frames captured in various surveillance scenes and 2M+ head annotations. Finally, we conduct extensive experiments on three datasets including our VSCrowd, and the experiment results show that the proposed method is capable to achieve state-of-the-art performance for both video crowd localization and counting.

Video Crowd Localization With Multifocus Gaussian Neighborhood Attention and a Large-Scale Benchmark

期刊

IEEE TRANSACTIONS ON IMAGE PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Video Crowd Localization With Multifocus Gaussian Neighborhood Attention and a Large-Scale Benchmark

期刊

IEEE TRANSACTIONS ON IMAGE PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文