☆ 4.6 Article

Contrastive predictive coding with transformer for video representation learning

NEUROCOMPUTING (2022)

期刊

NEUROCOMPUTING

卷 482, 期 -, 页码 154-162

出版社

ELSEVIER

DOI: 10.1016/j.neucom.2021.11.031

关键词

Contrastive Learning; Self-Attention; Video Representation

类别

Computer Science, Artificial Intelligence

资金

Education Department of Jiangxi Province of China [GJJ204912]
Science and Technology Bureau of Ganzhou City of China [[2020]60]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper presents a novel framework of self-supervised learning for video representation. The framework combines contrastive predictive coding and self-attention, and introduces the Transformer architecture to capture long-range spatio-temporal dependencies. The model achieves state-of-the-art self-supervised performance on UCF101 and HMDB51 datasets.

This paper presents a novel framework of self-supervised learning for video representation. Inspired by Contrastive Predictive Coding and Self-attention, we make the following contributions: First, we propose the Contrastive Predictive Coding with Transformer (CPCTR) framework for video representation learning in a self-supervised fashion. Second, we introduce the Transformer architecture to CPCTR to capture long-range spatio-temporal dependencies in order to facilitate the learning of slow features in video, and we conduct analysis of Transformer in our model to show its effectiveness. Finally, we evaluate our model by first training on the UCF101 dataset with self-supervised learning, and then fine-tuning on downstream video classification tasks. Using RGB only video data, we achieve state-of-the-art self-supervised performance on both UCF101 (Top1 accuracy of 99.3%) and HMDB51 (Top1 accuracy of 82.4%), we show that CPCTR even outperforms fully supervised methods on the two datasets. The code is available at https://github.com/yliu1229/CPCTR. (C) 2021 Elsevier B.V. All rights reserved.

Contrastive predictive coding with transformer for video representation learning

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Contrastive predictive coding with transformer for video representation learning

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文