Journal
IEEE TRANSACTIONS ON MULTIMEDIA
Volume 23, Issue -, Pages 597-610Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2020.2985525
Keywords
Cameras; Training; Feature extraction; Data models; Training data; Machine learning; UHDTV; Unsupervised person re-identification; Spatio-temporal regularization; Tracklet association
Categories
Funding
- National Key RAMP
- D Program of China [2018YFB1402600]
- National Natural Science Foundation of China [61822208, 61632019]
- Youth Innovation Promotion Association CAS [2018497]
- NSFC [61836011]
Ask authors/readers for more resources
This study introduces a progressive deep learning method for unsupervised person Re-ID, called Tracklet Association with Spatio-Temporal Regularization (TASTR). Experimental results show that with the spatio-temporal constraint in the training phase, the proposed approach outperforms state-of-the-art unsupervised methods by notable margins on one dataset and achieves competitive performance on two other datasets with fully supervised methods.
Existing methods for person re-identification (Re-ID) are mostly based on supervised learning which requires numerous manually labeled samples across all camera views for training. Such a paradigm suffers the scalability issue since in real-world Re-ID application, it is difficult to exhaustively label abundant identities over multiple disjoint camera views. To this end, we propose a progressive deep learning method for unsupervised person Re-ID in the wild by Tracklet Association with Spatio-Temporal Regularization (TASTR). In our approach, we first collect tracklet data within each camera by automatic person detection and tracking. Then, an initial Re-ID model is trained based on within-camera triplet construction for person representation learning. After that, based on the person visual feature and spatio-temporal constraint, we associate cross-camera tracklets to generate cross-camera triplets and update the Re-ID model. Lastly, with the refined Re-ID model, better visual feature of person can be extracted, which further promote the association of cross-camera tracklets. The last two steps are iterated multiple times to progressively upgrade the Re-ID model. To facilitate the study, we have collected a new 4K UHD video dataset named Campus4K with full frames and full spatio-temporal information. Experimental results show that with the spatio-temporal constraint in the training phase, the proposed approach outperforms the state-of-the-art unsupervised methods by notable margins on DukeMTMC-reID, and achieves competitive performance to fully supervised methods on both DukeMTMC-reID and Campus4K datasets.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available