4.7 Article

A Cuboid CNN Model With an Attention Mechanism for Skeleton-Based Action Recognition

Journal

IEEE TRANSACTIONS ON MULTIMEDIA
Volume 22, Issue 11, Pages 2977-2989

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2019.2962304

Keywords

Feature extraction; Skeleton; Sensors; Three-dimensional displays; Spatiotemporal phenomena; Hidden Markov models; Neural networks; CNN; action recognition; attention mechanism; feature cuboid

Funding

  1. National Key R&D Program of China [2018YFB1308000]
  2. National Natural Science Foundation of China [61772455, 61572486, 61772508, U1713213]
  3. Yunnan Natural Science Funds [2018FY001(-013), 2019FA045]
  4. Guangdong Technology Project [2017B010110007, 2016B010108010]
  5. Program for Excellent Young Talents of National Natural Science Foundation of Yunnan University [2018YDJQ004]
  6. Program for Excellent Young Talents of Yunnan University [WX069051]
  7. Project of Innovative Research Team of Yunnan Province [2018HC019]
  8. Shenzhen Technology Project [JCYJ20180507182610734, JCYJ20170413152535587]

Ask authors/readers for more resources

The introduction of depth sensors such as Microsoft Kinect have driven research in human action recognition. Human skeletal data collected from depth sensors convey a significant amount of information for action recognition. While there has been considerable progress in action recognition, most existing skeleton-based approaches neglect the fact that not all human body parts move during many actions, and they fail to consider the ordinal positions of body joints. Here, and motivated by the fact that an actions category is determined by local joint movements, we propose a cuboid model for skeleton-based action recognition. Specifically, a cuboid arranging strategy is developed to organize the pairwise displacements between all body joints to obtain a cuboid action representation. Such a representation is well structured and allows deep CNN models to focus analyses on actions. Moreover, an attention mechanism is exploited in the deep model, such that the most relevant features are extracted. Extensive experiments on our new Yunnan University-Chinese Academy of Sciences-Multimodal Human Action Dataset (CAS-YNU MHAD), the NTU RGB+D dataset, the UTD-MHAD dataset, and the UTKinect-Action3D dataset demonstrate the effectiveness of our method compared to the current state-of-the-art.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available