4.7 Article

Augmented Skeleton Based Contrastive Action Learning with Momentum LSTM for Unsupervised Action Recognition

Journal

INFORMATION SCIENCES
Volume 569, Issue -, Pages 90-109

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2021.04.023

Keywords

Skeleton based action recognition; Skeleton data augmentation; Unsupervised deep learning; Contrastive learning; Momentum LSTM

Funding

  1. National Key Research and Development Program of China [2019YFA0706200]
  2. National Natural Science Foundation of China [61632014, 61627808]

Ask authors/readers for more resources

The paper proposes a contrastive action learning paradigm named AS-CAL, which learns action representations in an unsupervised manner by contrasting similarities between augmented instances of skeleton sequences and encoding longterm action dynamics using momentum LSTM. The approach significantly outperforms hand-crafted methods and even achieves superior performance to many supervised learning methods.
Action recognition via 3D skeleton data is an emerging important topic. Most existing methods rely on hand-crafted descriptors to recognize actions, or perform supervised action representation learning with massive labels. In this paper, we for the first time propose a contrastive action learning paradigm named AS-CAL that exploits different augmentations of unlabeled skeleton sequences to learn action representations in an unsupervised manner. Specifically, we first propose to contrast similarity between augmented instances of the input skeleton sequence, which are transformed with multiple novel augmentation strategies, to learn inherent action patterns (pattern-invariance) in different skeleton transformations. Second, to encourage learning the pattern-invariance with more consistent action representations, we propose a momentum LSTM, which is implemented as the momentum-based moving average of LSTM based query encoder, to encode longterm action dynamics of the key sequence. Third, we introduce a queue to store the encoded keys, which allows flexibly reusing proceeding keys to build a consistent dictionary to facilitate contrastive learning. Last, we propose a novel representation named Contrastive Action Encoding (CAE) to represent human's action effectively. Empirical evaluations show that our approach significantly outperforms hand-crafted methods by 10- 50% Top-1 accuracy, and it can even achieve superior performance to many supervised learning methods (Our codes are available athttps://github.com/Mikexu007/AS-CAL). (c) 2021 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available