4.6 Article

Crowd Counting by Multi-Scale Dilated Convolution Networks

期刊

ELECTRONICS
卷 12, 期 12, 页码 -

出版社

MDPI
DOI: 10.3390/electronics12122624

关键词

deep learning; crowd counting; density map estimation; spatial pyramid pooling (SPP); multi-scale feature extraction; dilated convolution

向作者/读者索取更多资源

The number of people in a crowd is crucial in various fields, and the accuracy of counting in public spaces is often compromised by uneven crowd distribution and differences in head scale due to varying distances from the camera. To address these issues, a deep learning crowd counting model called multi-scale dilated convolution networks (MSDCNet) is proposed, based on crowd density map estimation. The model consists of a front-end network, a core network, and a back-end network, all designed to extract features and improve counting accuracy. Experimental results on three public datasets demonstrate that the proposed model successfully solves these problems and outperforms representative models in terms of mean absolute error (MAE) and mean square error (MSE).
The number of people in a crowd is crucial information in public safety, intelligent monitoring, traffic management, architectural design, and other fields. At present, the counting accuracy in public spaces remains compromised by some unavoidable situations, such as the uneven distribution of a crowd and the difference in head scale caused by people's differing distances from the camera. To solve these problems, we propose a deep learning crowd counting model, multi-scale dilated convolution networks (MSDCNet), based on crowd density map estimation. MSDCNet consists of three parts. The front-end network uses the truncated VGG16 to obtain preliminary features of the input image, with a proposed spatial pyramid pooling (SPP) module replacing the max-pooling layer to extract features with scale invariance. The core network is our proposed multi-scale feature extraction network (MFENet) for extracting features in three different scales. The back-end network consists of consecutive dilation convolution layers instead of traditional alternate convolution and pooling to expand the receptive field, extract high-level semantic information and avoid the spatial feature loss of small-scale heads. The experimental results on three public datasets show that the proposed model solved the above problems satisfactorily and obtained better counting accuracy than representative models in terms of mean absolute error (MAE) and mean square error (MSE).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据