期刊
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
卷 30, 期 10, 页码 3651-3662出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSVT.2019.2943010
关键词
Estimation; Distortion; Cameras; Task analysis; Convolutional neural networks; Australia; Fuses; Crowd counting; depth embedding; perspective distortion; scale variation
资金
- National Science Fund of China [61571297, 61420106008]
- National Key Research and Development Program [2017YFB1002401]
- 111 Program [B07022]
- Science and Technology Commission of Shanghai Municipality (STCSM) [18DZ2270700, 18DZ1112300]
Scale variation of pedestrians in a crowd image presents a significant challenge for vision-based people counting systems. Such variations are mainly caused by perspective-related distortions due to the camera pose relative to the ground plane. Following the density-based counting paradigm, we postulate that generating density values adaptive to object scales plays a critical role in the accuracy of the final counting results. Motivated by this, we distill the underlying information from depth cues to obtain scale-aware representations that can respond to object scales considering the fact that the scale is inversely proportional to the object depth. Specifically, we propose a depth embedding module as add-ons into existing networks. This module exploits essential depth cues to spatially re-calibrate the magnitude of the original features. In this way, the objects, although in the same class, will attain distinct representations according to their scales, which directly benefits the estimation of scale-aware density values. We conduct a comprehensive analysis of the effects of the depth embedding module and validate that exploiting depth cues to perceive object scale variations in convolutional neural networks improves crowd counting performances. Our experiments demonstrate the effectiveness of the proposed approach on four popular benchmark datasets.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据