Journal
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022)
Volume -, Issue -, Pages 19596-19605Publisher
IEEE COMPUTER SOC
DOI: 10.1109/CVPR52688.2022.01901
Keywords
-
Funding
- National Key Research and Development Project of China [2019YFB1312000]
- National Natural Science Foundation of China [62076195, U20B2052]
Ask authors/readers for more resources
This paper presents a Multifaceted Attention Network (MAN) to deal with the challenging task of crowd counting. The proposed network incorporates global attention, learnable local attention, and instance attention to enhance the performance of Transformer models in local spatial relation encoding.
This paper focuses on the challenging crowd counting task. As large-scale variations often exist within crowd images, neither fixed-size convolution kernel of CNN nor fixed-size attention of recent vision transformers can well handle this kind of variations. To address this problem, we propose a Multifaceted Attention Network (MAN) to improve transformer models in local spatial relation encoding. MAN incorporates global attention from vanilla transformer, learnable local attention, and instance attention into a counting model. Firstly, the local Learnable Region Attention (LRA) is proposed to assign attention exclusive for each feature location dynamically. Secondly, we design the Local Attention Regularization to supervise the training of LRA by minimizing the deviation among the attention for different feature locations. Finally, we provide an Instance Attention mechanism to focus on the most important instances dynamically during training. Extensive experiments on four challenging crowd counting datasets namely ShanghaiTech, UCF-QNRF, JHU++, and IVWPU have validated the proposed method.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available