☆ 4.7 Article

Identifying the key frames: An attention-aware sampling method for action recognition

PATTERN RECOGNITION (2022)

期刊

PATTERN RECOGNITION

卷 130, 期 -, 页码 -

出版社

ELSEVIER SCI LTD

DOI: 10.1016/j.patcog.2022.108797

关键词

Action recognition; Deep learning; Reinforcement learning; Pseudo labels

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic

资金

Major Project for New Generation of AI [2018AAA0100400]
National Natural Science Foundation of China [61836014, U21B2042, 62006231, 62072457]
National Youth Talent Support Program

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Deep learning based methods have made remarkable progress in action recognition. This paper proposes an attention-aware sampling method for action recognition to further enhance the existing deep learning models. The method uses deep reinforcement learning to train an attention agent, which selects relevant frames and discards irrelevant ones. The approach achieves competitive performance on two widely used action recognition datasets.

Deep learning based methods have achieved remarkable progress in action recognition. Existing works mainly focus on designing novel deep architectures to learn video representations for action recognition. Most existing methods treat sampled frames equally and average all the frame-level predictions to generate video-level predictions at the testing stage. However, within a video, discriminative actions may occur sparsely in a few frames whereas most other frames are irrelevant to the ground truth which may even lead to wrong results. As a result, we think that the strategy of selecting relevant frames would be a further important key to enhance the existing deep learning based action recognition. In this paper, we propose an attention-aware sampling method for action recognition, which aims to discard the irrelevant and misleading frames and preserve the most discriminative frames. We formulate the process of mining key frames from videos as a Markov decision process and train the attention agent through deep reinforcement learning without extra labels. The agent takes features and predictions from the baseline model as inputs and generates importance scores for all frames. Moreover, our approach is extensible, which can be applied to different existing deep learning based action recognition models. We achieve very competitive action recognition performance on two widely used action recognition datasets. (c) 2022 Elsevier Ltd. All rights reserved.

Identifying the key frames: An attention-aware sampling method for action recognition

期刊

PATTERN RECOGNITION

出版社

ELSEVIER SCI LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Identifying the key frames: An attention-aware sampling method for action recognition

期刊

PATTERN RECOGNITION

出版社

ELSEVIER SCI LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文