4.6 Article

Contrastive predictive coding with transformer for video representation learning

Journal

NEUROCOMPUTING
Volume 482, Issue -, Pages 154-162

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2021.11.031

Keywords

Contrastive Learning; Self-Attention; Video Representation

Funding

  1. Education Department of Jiangxi Province of China [GJJ204912]
  2. Science and Technology Bureau of Ganzhou City of China [[2020]60]

Ask authors/readers for more resources

This paper presents a novel framework of self-supervised learning for video representation. The framework combines contrastive predictive coding and self-attention, and introduces the Transformer architecture to capture long-range spatio-temporal dependencies. The model achieves state-of-the-art self-supervised performance on UCF101 and HMDB51 datasets.
This paper presents a novel framework of self-supervised learning for video representation. Inspired by Contrastive Predictive Coding and Self-attention, we make the following contributions: First, we propose the Contrastive Predictive Coding with Transformer (CPCTR) framework for video representation learning in a self-supervised fashion. Second, we introduce the Transformer architecture to CPCTR to capture long-range spatio-temporal dependencies in order to facilitate the learning of slow features in video, and we conduct analysis of Transformer in our model to show its effectiveness. Finally, we evaluate our model by first training on the UCF101 dataset with self-supervised learning, and then fine-tuning on downstream video classification tasks. Using RGB only video data, we achieve state-of-the-art self-supervised performance on both UCF101 (Top1 accuracy of 99.3%) and HMDB51 (Top1 accuracy of 82.4%), we show that CPCTR even outperforms fully supervised methods on the two datasets. The code is available at https://github.com/yliu1229/CPCTR. (C) 2021 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available