☆ 4.7 Article

Learning Representations From Skeletal Self-Similarities for Cross-View Action Recognition

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2021)

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Volume 31, Issue 1, Pages 160-174

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCSVT.2020.2965574

Keywords

Cross-view action recognition; human skeleton; self-similarity; multi-stream neural network; view-invariant representation

Funding

National Natural Science Foundation of China [61603341, 61976191, 61873220, 61773272, 61876168]
Zhejiang Provincial Natural Science Foundation of China [LY19F030015]
Post-doctoral Fellowship from China Scholarship Council
Mitacs Globalink Early Career Fellowship

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper proposes a method to create view-invariant action descriptions using skeletal self-similarities and learning with a multi-stream neural network. By integrating skeletal self-similarities of different scales into the network, the method shows good robustness to view changes.

Existing research attention in vision-based action recognition is generally paid on recognizing actions from the same views seen in the training data. One of the big challenges in action recognition lies in the large variations of action representations as actions are captured from totally different viewpoints. This paper addresses this problem by learning view-invariant representations from skeletal self-similarities of varying scales with a very light multi-stream neural network (MSNN). As human skeletons have been proved to be an effective feature modality used for action recognition and are easy to obtain, we first create a view-invariant action description by formulating skeletal self-similarities at each frame as an image (SSI), which can show a high structural stability under view changes. Accordingly, a MSNN is designed based on 3D CNN and LSTM units to learn representations from SSIs of multiple scales, where the scheme of multiple scales provides our method with a good robustness to view changes. In addition, we integrate the computation of SSIs into the MSNN by wrapping it as a custom learnable layer thanks to its simplicity, instead of normalizing and transforming skeletons using a hand-crafted preprocessing. Extensive experimental evaluations on three challenging cross-view datasets demonstrate the effectiveness of our proposed method, which achieves superior performance to the state-of-the-art algorithms on cross-view recognition. The source code of this work will be released shortly to facilitate future studies in this field.

Learning Representations From Skeletal Self-Similarities for Cross-View Action Recognition

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Learning Representations From Skeletal Self-Similarities for Cross-View Action Recognition

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper