期刊
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19)
卷 -, 期 -, 页码 1823-1832出版社
ASSOC COMPUTING MACHINERY
DOI: 10.1145/3343031.3350881
关键词
Crowd counting; density map estimation; scale-aware attention; dilated convolution; deformable convolution
资金
- National Natural Science Foundation of China (NSFC) [61725203, 61732008, 61876058, 61632007]
Most existing CNN-based methods for crowd counting always suffer from large scale variation in objects of interest, leading to density maps of low quality. In this paper, we propose a novel deep model called Dilated-Attention-Deformable ConvNet (DADNet), which consists of two schemes: multi-scale dilated attention and deformable convolutional DME (Density Map Estimation). The proposed model explores a scale-aware attention fusion with various dilation rates to capture different visual granularities of crowd regions of interest, and utilizes deformable convolutions to generate a high-quality density map. There are two merits as follows: (1) varying dilation rates can effectively identify discriminative regions by enlarging the receptive fields of convolutional kernels upon surrounding region cues, and (2) deformable CNN operations promote the accuracy of object localization in the density map by augmenting the spatial object location sampling with adaptive offsets and scalars. DADNet not only excels at capturing rich spatial context of salient and tiny regions of interest simultaneously, but also keeps a robustness to background noises, such as partially occluded objects. Extensive experiments on benchmark datasets verify that DADNet achieves the state-of-the-art performance. Visualization results of the multi-scale attention maps further validate the remarkable interpretability achieved by our solution.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据