4.7 Article

Spatio-temporal deformable 3D ConvNets with attention for action recognition

Journal

PATTERN RECOGNITION
Volume 98, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2019.107037

Keywords

Action recognition; Spatio-temporal deformable; Attention mechanism; 3D ConvNets

Funding

  1. National Natural Science Foundation of China [61690202, 61872021]
  2. Fundamental Research Funds for Central Universities [YWF-19-BJ-J-271]
  3. Beijing Municipal Science and Technology Commission [Z171100000117022]
  4. State Key Lab of Software Development Environment [SKLSDE-2018ZX-04]

Ask authors/readers for more resources

The irregularity of human actions poses great challenges in video action recognition. Recently, 3D ConvNet methods have shown promising performance at modelling the motion and appearance information. However, the fixed geometric structure of 3D convolution filters largely limits the learning capacity for video action recognition. To address this problem, this paper proposes a spatio-temporal deformable ConvNet module with an attention mechanism, which takes into consideration the mutual correlations in both temporal and spatial domains, to effectively capture the long-range and long-distance dependencies in the video actions. Our attention based deformable module, as a generic module for 3D ConvNets, can adaptively learn more accurate spatio-temporal offsets to model the action irregularity. The experiments on two popular datasets (UCF-101 and HMDB-51) demonstrate that our module significantly outperforms the state-of-the-art methods. (C) 2019 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available