3.8 Proceedings Paper

Boosting Crowd Counting via Multifaceted Attention

出版社

IEEE COMPUTER SOC
DOI: 10.1109/CVPR52688.2022.01901

关键词

-

资金

  1. National Key Research and Development Project of China [2019YFB1312000]
  2. National Natural Science Foundation of China [62076195, U20B2052]

向作者/读者索取更多资源

This paper presents a Multifaceted Attention Network (MAN) to deal with the challenging task of crowd counting. The proposed network incorporates global attention, learnable local attention, and instance attention to enhance the performance of Transformer models in local spatial relation encoding.
This paper focuses on the challenging crowd counting task. As large-scale variations often exist within crowd images, neither fixed-size convolution kernel of CNN nor fixed-size attention of recent vision transformers can well handle this kind of variations. To address this problem, we propose a Multifaceted Attention Network (MAN) to improve transformer models in local spatial relation encoding. MAN incorporates global attention from vanilla transformer, learnable local attention, and instance attention into a counting model. Firstly, the local Learnable Region Attention (LRA) is proposed to assign attention exclusive for each feature location dynamically. Secondly, we design the Local Attention Regularization to supervise the training of LRA by minimizing the deviation among the attention for different feature locations. Finally, we provide an Instance Attention mechanism to focus on the most important instances dynamically during training. Extensive experiments on four challenging crowd counting datasets namely ShanghaiTech, UCF-QNRF, JHU++, and IVWPU have validated the proposed method.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据