☆ 4.7 Article

Tracking Persons-of-Interest via Unsupervised Representation Adaptation

INTERNATIONAL JOURNAL OF COMPUTER VISION (2020)

期刊

INTERNATIONAL JOURNAL OF COMPUTER VISION

卷 128, 期 1, 页码 96-120

出版社

SPRINGER

DOI: 10.1007/s11263-019-01212-1

关键词

Face tracking; Transfer learning; Convolutional neural networks; Triplet loss

类别

Computer Science, Artificial Intelligence

资金

National Basic Research Program of China (973 Program) [2015CB351705]
National Key Research and Development Program of China [2017YFA0700805]
NSFC [61703344]
Office of Naval Research [N0014-16-1-2314]
Ministry of Science and ICT of Korea [NRF-2017R1A2B4011928, NRF-2017M3C4A7069369]
NSF CRII [1755785]
NSF CAREER [1149783]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often can appear drastically different in multiple shots due to significant variations in scale, pose, expression, illumination, and make-up. Existing multi-target tracking methods often use low-level features which are not sufficiently discriminative for identifying faces with such large appearance variations. In this paper, we tackle this problem by learning discriminative, video-specific face representations using convolutional neural networks (CNNs). Unlike existing CNN-based approaches which are only trained on large-scale face image datasets offline, we automatically generate a large number of training samples using the contextual constraints for a given video, and further adapt the pre-trained face CNN to the characters in the specific videos using discovered training samples. The embedding feature space is fine-tuned so that the Euclidean distance in the space corresponds to the semantic face similarity. To this end, we devise a symmetric triplet loss function which optimizes the network more effectively than the conventional triplet loss. With the learned discriminative features, we apply an EM clustering algorithm to link tracklets across multiple shots to generate the final trajectories. We extensively evaluate the proposed algorithm on two sets of TV sitcoms and YouTube music videos, analyze the contribution of each component, and demonstrate significant performance improvement over existing techniques.

Tracking Persons-of-Interest via Unsupervised Representation Adaptation

期刊

INTERNATIONAL JOURNAL OF COMPUTER VISION

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Tracking Persons-of-Interest via Unsupervised Representation Adaptation

期刊

INTERNATIONAL JOURNAL OF COMPUTER VISION

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文