4.7 Article

Selective spatiotemporal features learning for dynamic gesture recognition

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 169, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2020.114499

Keywords

Dynamic gesture recognition; Deep learning; Spatiotemporal features learning; Heterogeneous network; Attention mechanism

Funding

  1. National Natural Science Foundation of China [61673079]
  2. Natural Science Foundation of Chongqing [cstc2018jcyjAX0160]
  3. Science and Technology Project of Chongqing Education Committee [KJQN201902404]

Ask authors/readers for more resources

The paper introduces a novel gesture recognition model architecture that combines the ResC3D network and ConvLSTM with a dynamic select mechanism called Selective Spatiotemporal features learning (SeST). This heterogeneous network system can simultaneously learn short-term and long-term spatiotemporal features, outperforming other methods.
YY Gesture recognition, which aims to understand meaningful movements of human bodies, plays an essential role in human-computer interaction. The key to gesture recognition is to learn compact and effective spatiotemporal information. However, it remains a challenging task due to the barriers of gesture-irrelevant factors. A number of attempts have been taken to address this problem by cascading deep heterogeneous architectures. However, this cascading strategy cannot capture both local and global spatiotemporal features at each stage of feature learning. In this paper, we propose a novel refined fusion model architecture combining the ResC3D network and Convolutional LSTM (ConvLSTM) with a dynamic select mechanism called Selective Spatiotemporal features learning (SeST). Such a heterogeneous network system is able to simultaneously learn short-term and long-term spatiotemporal features, and they are complementary to each other. The SeST block enables the ResC3D network and ConvLSTM to adaptively adjust their contributions to classification during feature learning with softattention. The method has been evaluated on the three publicly available datasets: the Sheffield Kinect Gesture (SKIG) dataset, the ChaLearn LAP large scale isolated gesture dataset (IsoGD), and the EgoGesture dataset. Experiment results show that the proposed method outperforms other state-of-the-art methods. Besides, our model is an end-to-end model, which can be embedded in many intelligent systems applications.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available