4.6 Article

TransCrowd: weakly-supervised crowd counting with transformers

Journal

SCIENCE CHINA-INFORMATION SCIENCES
Volume 65, Issue 6, Pages -

Publisher

SCIENCE PRESS
DOI: 10.1007/s11432-021-3445-y

Keywords

crowd counting; visual transformer; weakly supervised; crowd analysis; transformer

Funding

  1. National Key R&D Program of China [2018YFB1004600]

Ask authors/readers for more resources

This paper proposes TransCrowd, a weakly-supervised crowd counting method based on transformers. By utilizing the self-attention mechanism of transformers, TransCrowd effectively extracts semantic crowd information, addressing the limited receptive fields for context modeling in traditional CNN methods. Experiments show that TransCrowd outperforms other weakly-supervised CNN methods and achieves competitive performance compared to some fully-supervised counting methods.
The mainstream crowd counting methods usually utilize the convolution neural network (CNN) to regress a density map, requiring point-level annotations. However, annotating each person with a point is an expensive and laborious process. During the testing phase, the point-level annotations are not considered to evaluate the counting accuracy, which means the point-level annotations are redundant. Hence, it is desirable to develop weakly-supervised counting methods that just rely on count-level annotations, a more economical way of labeling. Current weakly-supervised counting methods adopt the CNN to regress a total count of the crowd by an image-to-count paradigm. However, having limited receptive fields for context modeling is an intrinsic limitation of these weakly-supervised CNN-based methods. These methods thus cannot achieve satisfactory performance, with limited applications in the real world. The transformer is a popular sequence-to-sequence prediction model in natural language processing (NLP), which contains a global receptive field. In this paper, we propose TransCrowd, which reformulates the weakly-supervised crowd counting problem from the perspective of sequence-to-count based on transformers. We observe that the proposed TransCrowd can effectively extract the semantic crowd information by using the self-attention mechanism of transformer. To the best of our knowledge, this is the first work to adopt a pure transformer for crowd counting research. Experiments on five benchmark datasets demonstrate that the proposed TransCrowd achieves superior performance compared with all the weakly-supervised CNN-based counting methods and gains highly competitive counting performance compared with some popular fully-supervised counting methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available