4.6 Article

Focalized contrastive view-invariant learning for self-supervised skeleton-based action recognition

Journal

NEUROCOMPUTING
Volume 537, Issue -, Pages 198-209

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2023.03.070

Keywords

Self -supervised learning; Skeleton -based action recognition; Contrastive learning

Ask authors/readers for more resources

In this work, a self-supervised framework called FoCoViL is proposed, which associates actions with common view-invariant properties and simultaneously separates dissimilar viewpoints by maximizing mutual information between multi-view sample pairs. An adaptive focalization method based on pairwise similarity is further proposed to enhance contrastive learning for a clearer cluster boundary. FoCoViL performs well on both unsupervised and supervised classifiers, and the proposed contrastive-based focalization generates a more discriminative latent representation.
Learning view-invariant representation is a key to improving feature discrimination power for skeleton -based action recognition. Existing approaches cannot effectively remove the impact of viewpoint due to the implicit view-dependent representations. In this work, we propose a self-supervised framework called Focalized Contrastive View-invariant Learning (FoCoViL), which significantly suppresses the view-specific information on the representation space where the viewpoints are coarsely aligned. By maximizing mutual information with an effective contrastive loss between multi-view sample pairs, FoCoViL associates actions with common view-invariant properties and simultaneously separates the dissimilar ones. We further propose an adaptive focalization method based on pairwise similarity to enhance contrastive learning for a clearer cluster boundary in the learned space. Different from many existing self-supervised representation learning work that rely heavily on supervised classifiers, FoCoViL performs well on both unsupervised and supervised classifiers with superior recognition perfor-mance. Extensive experiments also show that the proposed contrastive-based focalization generates a more discriminative latent representation.(c) 2023 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available