☆ 4.7 Article

Learning shape and motion representations for view invariant skeleton-based action recognition

PATTERN RECOGNITION (2020)

Journal

PATTERN RECOGNITION

Volume 103, Issue -, Pages -

Publisher

ELSEVIER SCI LTD

DOI: 10.1016/j.patcog.2020.107293

Keywords

Human action recognition; Skeleton sequence; Representation learning; View invariant; Geometric Algebra

Funding

National Natural Science Foundation of China [61771319, 61871154]
Natural Science Foundation of Guangdong Province [2017A030313343, 2019A1515011307]
Shenzhen Science and Technology Project [JCYJ20180507182259896]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Skeleton-based action recognition is an increasing attentioned task that analyses spatial configuration and temporal dynamics of a human action from skeleton data, which has been widely applied in intelligent video surveillance and human-computer interaction. How to design an effective framework to learn discriminative spatial and temporal characteristics for skeleton-based action recognition is still a challenging problem. The shape and motion representations of skeleton sequences are the direct embodiment of spatial and temporal characteristics respectively, which can well address for human action description. In this work, we propose an original unified framework to learn comprehensive shape and motion representations from skeleton sequences by using Geometric Algebra. We firstly construct skeleton sequence space as a subset of Geometric Algebra to represent each skeleton sequence along both the spatial and temporal dimensions. Then rotor-based view transformation method is proposed to eliminate the effect of viewpoint variation, which remains the relative spatio-temporal relations among skeleton frames in a sequence. We also construct spatio-temporal view invariant model (STVIM) to collectively integrate spatial configuration and temporal dynamics of skeleton joints and bones. In STVIM, skeleton sequence shape and motion representations which mutually compensate are jointly learned to describe skeletonbased actions comprehensively. Furthermore, a selected multi-stream Convolutional Neural Network is employed to extract and fuse deep features from mapping images of the learned representations for skeleton-based action recognition. Experimental results on NTU RGB+D, Northwestern-UCLA and UTD-MHAD datasets consistently verify the effectiveness of our proposed method and the superior performance over state-of-the-art competitors. (C) 2020 Elsevier Ltd. All rights reserved.

Learning shape and motion representations for view invariant skeleton-based action recognition

Journal

PATTERN RECOGNITION

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Learning shape and motion representations for view invariant skeleton-based action recognition

Journal

PATTERN RECOGNITION

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper