4.7 Article

Learning shape and motion representations for view invariant skeleton-based action recognition

Journal

PATTERN RECOGNITION
Volume 103, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2020.107293

Keywords

Human action recognition; Skeleton sequence; Representation learning; View invariant; Geometric Algebra

Funding

  1. National Natural Science Foundation of China [61771319, 61871154]
  2. Natural Science Foundation of Guangdong Province [2017A030313343, 2019A1515011307]
  3. Shenzhen Science and Technology Project [JCYJ20180507182259896]

Ask authors/readers for more resources

Skeleton-based action recognition is an increasing attentioned task that analyses spatial configuration and temporal dynamics of a human action from skeleton data, which has been widely applied in intelligent video surveillance and human-computer interaction. How to design an effective framework to learn discriminative spatial and temporal characteristics for skeleton-based action recognition is still a challenging problem. The shape and motion representations of skeleton sequences are the direct embodiment of spatial and temporal characteristics respectively, which can well address for human action description. In this work, we propose an original unified framework to learn comprehensive shape and motion representations from skeleton sequences by using Geometric Algebra. We firstly construct skeleton sequence space as a subset of Geometric Algebra to represent each skeleton sequence along both the spatial and temporal dimensions. Then rotor-based view transformation method is proposed to eliminate the effect of viewpoint variation, which remains the relative spatio-temporal relations among skeleton frames in a sequence. We also construct spatio-temporal view invariant model (STVIM) to collectively integrate spatial configuration and temporal dynamics of skeleton joints and bones. In STVIM, skeleton sequence shape and motion representations which mutually compensate are jointly learned to describe skeletonbased actions comprehensively. Furthermore, a selected multi-stream Convolutional Neural Network is employed to extract and fuse deep features from mapping images of the learned representations for skeleton-based action recognition. Experimental results on NTU RGB+D, Northwestern-UCLA and UTD-MHAD datasets consistently verify the effectiveness of our proposed method and the superior performance over state-of-the-art competitors. (C) 2020 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available