期刊
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022)
卷 -, 期 -, 页码 19596-19605出版社
IEEE COMPUTER SOC
DOI: 10.1109/CVPR52688.2022.01901
关键词
-
资金
- National Key Research and Development Project of China [2019YFB1312000]
- National Natural Science Foundation of China [62076195, U20B2052]
This paper presents a Multifaceted Attention Network (MAN) to deal with the challenging task of crowd counting. The proposed network incorporates global attention, learnable local attention, and instance attention to enhance the performance of Transformer models in local spatial relation encoding.
This paper focuses on the challenging crowd counting task. As large-scale variations often exist within crowd images, neither fixed-size convolution kernel of CNN nor fixed-size attention of recent vision transformers can well handle this kind of variations. To address this problem, we propose a Multifaceted Attention Network (MAN) to improve transformer models in local spatial relation encoding. MAN incorporates global attention from vanilla transformer, learnable local attention, and instance attention into a counting model. Firstly, the local Learnable Region Attention (LRA) is proposed to assign attention exclusive for each feature location dynamically. Secondly, we design the Local Attention Regularization to supervise the training of LRA by minimizing the deviation among the attention for different feature locations. Finally, we provide an Instance Attention mechanism to focus on the most important instances dynamically during training. Extensive experiments on four challenging crowd counting datasets namely ShanghaiTech, UCF-QNRF, JHU++, and IVWPU have validated the proposed method.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据