4.7 Article

Multi-stream slowFast graph convolutional networks for skeleton-based action recognition

期刊

IMAGE AND VISION COMPUTING
卷 109, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.imavis.2021.104141

关键词

Action recognition; Graph convolutional network; Human skeleton; SlowFast network; Attention

资金

  1. National Natural Science Foundation of China [61471206, 61871445]
  2. Natural Science Foundation of Jiangsu Province [BK20180088]
  3. Scientific Research Foundation of Nanjing University of Posts and Telecommunications [NY218066]

向作者/读者索取更多资源

A SlowFast graph convolution network (SF-GCN) is proposed for improved spatial-temporal feature extraction from skeleton sequence, utilizing the architecture of SlowFast network in the GCN model. SF-GCN consists of Fast and Slow pathways to extract features of fast and slow temporal changes, respectively, which are fused and weighted using lateral connection and channel attention. This design enhances feature extraction ability while reducing computational costs significantly.
Recently, many efforts have been made to model spatial-temporal features from human skeleton for action recognition by using graph convolutional networks (GCN). Skeleton sequence can precisely represent human pose with a small number of joints while there is still a lot of redundancies across the skeleton sequence in the term of temporal dependency. In order to improve the effectiveness of spatial-temporal feature extraction from skeleton sequence, a SlowFast graph convolution network (SF-GCN) is proposed by implementing the architecture of SlowFast network, which is consisted of the Fast and Slow pathway, in the GCN model. The Fast pathway is a temporal attention embedded lightweight GCN for extracting the feature of fast temporal changes from the skeleton sequence with a high frame rate and fast refreshing speed. The Slow pathway is a spatial attention embedded GCN for extracting the feature of slow temporal changes from the skeleton sequence with a low frame rate and slow refreshing speed. The features of two pathways are fused by using lateral connection and weighted by using channel attention. Based on the aforementioned design, SF-GCN can achieve superior ability of feature extraction while the computational cost significantly drops. In addition to the coordinate information of joints, five high order sequences including edge, the spatial difference and temporal difference of joints and edges are induced to enhance the representation of human action. Six SF-GCNs are implemented for extracting spatial- temporal feature from six kinds of sequences and fused for skeleton-based action recognition, which is called multi-stream SlowFast graph convolutional networks (MSSF-GCN). Extensive experiments are conducted to evaluate the proposed method on three skeleton-based action recognition databases including NTU RGB + D, NTU RGB + D 120, and Skeleton-Kinetics. The results show that the proposed method is effective for skeleton based action recognition and can achieve the recognition accuracy with an obvious advantage in comparison with the state-of-the-art. (c) 2021 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据