4.7 Article

Identifying the key frames: An attention-aware sampling method for action recognition

期刊

PATTERN RECOGNITION
卷 130, 期 -, 页码 -

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2022.108797

关键词

Action recognition; Deep learning; Reinforcement learning; Pseudo labels

资金

  1. Major Project for New Generation of AI [2018AAA0100400]
  2. National Natural Science Foundation of China [61836014, U21B2042, 62006231, 62072457]
  3. National Youth Talent Support Program

向作者/读者索取更多资源

Deep learning based methods have made remarkable progress in action recognition. This paper proposes an attention-aware sampling method for action recognition to further enhance the existing deep learning models. The method uses deep reinforcement learning to train an attention agent, which selects relevant frames and discards irrelevant ones. The approach achieves competitive performance on two widely used action recognition datasets.
Deep learning based methods have achieved remarkable progress in action recognition. Existing works mainly focus on designing novel deep architectures to learn video representations for action recognition. Most existing methods treat sampled frames equally and average all the frame-level predictions to generate video-level predictions at the testing stage. However, within a video, discriminative actions may occur sparsely in a few frames whereas most other frames are irrelevant to the ground truth which may even lead to wrong results. As a result, we think that the strategy of selecting relevant frames would be a further important key to enhance the existing deep learning based action recognition. In this paper, we propose an attention-aware sampling method for action recognition, which aims to discard the irrelevant and misleading frames and preserve the most discriminative frames. We formulate the process of mining key frames from videos as a Markov decision process and train the attention agent through deep reinforcement learning without extra labels. The agent takes features and predictions from the baseline model as inputs and generates importance scores for all frames. Moreover, our approach is extensible, which can be applied to different existing deep learning based action recognition models. We achieve very competitive action recognition performance on two widely used action recognition datasets. (c) 2022 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据