4.7 Article

Structural Knowledge Distillation for Efficient Skeleton-Based Action Recognition

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING
Volume 30, Issue -, Pages 2963-2976

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIP.2021.3056895

Keywords

Skeleton; Training; Pose estimation; Feature extraction; Videos; Joints; Knowledge engineering; Skeleton-based action recognition; structural knowledge distillation; graph matching loss; graph convolutional network; gradient revision

Funding

  1. NSFC [U1803264, 62072334, 61672376, 61671325]

Ask authors/readers for more resources

This research explores the feasibility of using low-quality skeletons for action recognition and proposes a structural knowledge distillation scheme to minimize accuracy degradations and improve model robustness. The proposed scheme demonstrates effectiveness in various action recognition datasets.
Skeleton data have been extensively used for action recognition since they can robustly accommodate dynamic circumstances and complex backgrounds. To guarantee the action-recognition performance, we prefer to use advanced and time-consuming algorithms to get more accurate and complete skeletons from the scene. However, this may not be acceptable in time- and resource-stringent applications. In this paper, we explore the feasibility of using low-quality skeletons, which can be quickly and easily estimated from the scene, for action recognition. While the use of low-quality skeletons will surely lead to degraded action-recognition accuracy, in this paper we propose a structural knowledge distillation scheme to minimize this accuracy degradations and improve recognition model's robustness to uncontrollable skeleton corruptions. More specifically, a teacher which observes high-quality skeletons obtained from a scene is used to help train a student which only sees low-quality skeletons generated from the same scene. At inference time, only the student network is deployed for processing low-quality skeletons. In the proposed network, a graph matching loss is proposed to distill the graph structural knowledge at an intermediate representation level. We also propose a new gradient revision strategy to seek a balance between mimicking the teacher model and directly improving the student model's accuracy. Experiments are conducted on Kenetics400, NTU RGB+D and Penn action recognition datasets and the comparison results demonstrate the effectiveness of our scheme.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available