4.7 Article

SCAN: Self-and-Collaborative Attention Network for Video Person Re-Identification

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING
Volume 28, Issue 10, Pages 4870-4882

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIP.2019.2911488

Keywords

Temporal modeling; similarity measurement; collaborative representation; person re-identification; attention mechanism

Funding

  1. National Key Research and Development Program of China [2018YFC0830103]
  2. General Research Fund through the Research Grants Council of Hong Kong [CUHK14202217, CUHK14203118, CUHK14205615, CUHK14207814, CUHK14213616]

Ask authors/readers for more resources

Video person re-identification has attracted much attention in recent years. It aims to match image sequences of pedestrians from different camera views. Previous approaches usually improve this task from three aspects, including: 1) selecting more discriminative frames; 2) generating more informative temporal representations; and 3) developing more effective distance metrics. To address the above issues, we present a novel and practical deep architecture for video person re-identification termed self-and-collaborative attention network (SCAN), which adopts the video pairs as the input and outputs their matching scores. SCAN has several appealing properties. First, SCAN adopts a non-parametric attention mechanism to refine the intra-sequence and inter-sequence feature representation of videos and outputs self-and-collaborative feature representation for each video, making the discriminative frames aligned between the probe and gallery sequences. Second, beyond the existing models, a generalized pairwise similarity measurement is proposed to generate the similarity feature representation of video pair by calculating the Hadamard product of their self-representation difference and collaborative-representation difference. Thus, the matching result can be predicted by the binary classifier. Third, a dense clip segmentation strategy is also introduced to generate rich probe-gallery pairs to optimize the model. In the test phase, the final matching score of two videos is determined by averaging the scores of top-ranked clip-pairs. Extensive experiments demonstrate the effectiveness of SCAN, which outperforms the top-1 accuracies of the best-performing baselines on iLIDS-VID, PRID2011, and MARS datasets, respectively.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available