4.7 Article

Progressive Unsupervised Person Re-Identification by Tracklet Association With Spatio-Temporal Regularization

Journal

IEEE TRANSACTIONS ON MULTIMEDIA
Volume 23, Issue -, Pages 597-610

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2020.2985525

Keywords

Cameras; Training; Feature extraction; Data models; Training data; Machine learning; UHDTV; Unsupervised person re-identification; Spatio-temporal regularization; Tracklet association

Funding

  1. National Key RAMP
  2. D Program of China [2018YFB1402600]
  3. National Natural Science Foundation of China [61822208, 61632019]
  4. Youth Innovation Promotion Association CAS [2018497]
  5. NSFC [61836011]

Ask authors/readers for more resources

This study introduces a progressive deep learning method for unsupervised person Re-ID, called Tracklet Association with Spatio-Temporal Regularization (TASTR). Experimental results show that with the spatio-temporal constraint in the training phase, the proposed approach outperforms state-of-the-art unsupervised methods by notable margins on one dataset and achieves competitive performance on two other datasets with fully supervised methods.
Existing methods for person re-identification (Re-ID) are mostly based on supervised learning which requires numerous manually labeled samples across all camera views for training. Such a paradigm suffers the scalability issue since in real-world Re-ID application, it is difficult to exhaustively label abundant identities over multiple disjoint camera views. To this end, we propose a progressive deep learning method for unsupervised person Re-ID in the wild by Tracklet Association with Spatio-Temporal Regularization (TASTR). In our approach, we first collect tracklet data within each camera by automatic person detection and tracking. Then, an initial Re-ID model is trained based on within-camera triplet construction for person representation learning. After that, based on the person visual feature and spatio-temporal constraint, we associate cross-camera tracklets to generate cross-camera triplets and update the Re-ID model. Lastly, with the refined Re-ID model, better visual feature of person can be extracted, which further promote the association of cross-camera tracklets. The last two steps are iterated multiple times to progressively upgrade the Re-ID model. To facilitate the study, we have collected a new 4K UHD video dataset named Campus4K with full frames and full spatio-temporal information. Experimental results show that with the spatio-temporal constraint in the training phase, the proposed approach outperforms the state-of-the-art unsupervised methods by notable margins on DukeMTMC-reID, and achieves competitive performance to fully supervised methods on both DukeMTMC-reID and Campus4K datasets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available