☆ 4.7 Article

3-D PersonVLAD: Learning Deep Global Representations for Video-Based Person Reidentification

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2019)

Journal

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Volume 30, Issue 11, Pages 3347-3359

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNNLS.2019.2891244

Keywords

Feature extraction; Spatiotemporal phenomena; Neural networks; Streaming media; Solid modeling; Learning systems; Computer science; 3-D convolution; global representations; person reidentification (re-ID); vector of local aggregated descriptors (VLAD)

Funding

Fusion of Digital Microscopy and Plain Text Reports through Improved Pathology [ARC LP160101797]
National Natural Science Foundation of China [61432019, 61732008, 61725203, 61806035]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

We present the global deep video representation learning to video-based person reidentification (re-ID) that aggregates local 3-D features across the entire video extent. Existing methods typically extract frame-wise deep features from 2-D convolutional networks (ConvNets) which are pooled temporally to produce the video-level representations. However, 2-D ConvNets lose temporal priors immediately after the convolutions, and a separate temporal pooling is limited in capturing human motion in short sequences. In this paper, we present global video representation learning, to be complementary to 3-D ConvNets as a novel layer to capture the appearance and motion dynamics in full-length videos. Nevertheless, encoding each video frame in its entirety and computing aggregate global representations across all frames is tremendously challenging due to the occlusions and misalignments. To resolve this, our proposed network is further augmented with the 3-D part alignment to learn local features through the soft-attention module. These attended features are statistically aggregated to yield identity-discriminative representations. Our global 3-D features are demonstrated to achieve the state-of-the-art results on three benchmark data sets: MARS, Imagery Library for Intelligent Detection Systems-Video Re-identification, and PRID2011.

3-D PersonVLAD: Learning Deep Global Representations for Video-Based Person Reidentification

Journal

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

3-D PersonVLAD: Learning Deep Global Representations for Video-Based Person Reidentification

Journal

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper