☆ 4.6 Article

What-Where-When Attention Network for video-based person re-identification

NEUROCOMPUTING (2022)

Journal

NEUROCOMPUTING

Volume 468, Issue -, Pages 33-47

Publisher

ELSEVIER

DOI: 10.1016/j.neucom.2021.10.018

Keywords

Person re-identification; What-Where-When Attention; Spatial-temporal feature; Graph attention network; Attribute; Identity

Funding

National Natural Science Foundation of China [61801437, 61871351, 61971381, 61461025, 61871259, 61811530325 (IECnNSFCn170396), 61861024]
Natural Science Foundation of Shanxi Province [201801D221206, 201801D221207]
Scientific and Technological Innovation Programs of Higher Education Institutions in Shanxi [2020L0683]
Key research and development plan of Luliang City [2020GXZDYF21]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Video-based person re-identification is critical in intelligent video surveillance, and existing methods often use attention mechanisms to address challenging variations. However, these methods mainly focus on occlusion, neglecting other important spatial information and temporal cues in video frames. This paper proposes a comprehensive attention mechanism, W3AN, which effectively learns discriminative spatial-temporal features for person re-identification. The experimental results demonstrate the effectiveness of W3AN model and the contributions of major modules are clarified in the discussion.

Video-based person re-identification plays a critical role in intelligent video surveillance by learning temporal correlations from consecutive video frames. Most existing methods aim to solve the challenging variations of pose, occlusion, backgrounds and so on by using attention mechanism. They almost all draw attention to the occlusion and learn occlusion-invariant video representations by abandoning the occluded area or frames, while the other areas in these frames contain sufficient spatial information and temporal cues. To overcome these drawbacks, this paper proposes a comprehensive attention mechanism covering what, where, and when to pay attention in the discriminative spatial-temporal feature learning, namely What-Where-When Attention Network (W3AN). Concretely, W3AN designs a spatial attention module to focus on pedestrian identity and obvious attributes by the importance estimating layer (What and Where), and a temporal attention module to calculate the frame-level importance (when), which is embedded into a graph attention network to exploit temporal attention features rather than computing weighted average feature for video frames like existing methods. Moreover, the experiments on three widely-recognized datasets demonstrate the effectiveness of our proposed W3AN model and the discussion of major modules elaborates the contributions of this paper. (c) 2021 Elsevier B.V. All rights reserved.

What-Where-When Attention Network for video-based person re-identification

Journal

NEUROCOMPUTING

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

What-Where-When Attention Network for video-based person re-identification

Journal

NEUROCOMPUTING

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper