4.7 Article

Selective spatiotemporal features learning for dynamic gesture recognition

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 169, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2020.114499

关键词

Dynamic gesture recognition; Deep learning; Spatiotemporal features learning; Heterogeneous network; Attention mechanism

资金

  1. National Natural Science Foundation of China [61673079]
  2. Natural Science Foundation of Chongqing [cstc2018jcyjAX0160]
  3. Science and Technology Project of Chongqing Education Committee [KJQN201902404]

向作者/读者索取更多资源

The paper introduces a novel gesture recognition model architecture that combines the ResC3D network and ConvLSTM with a dynamic select mechanism called Selective Spatiotemporal features learning (SeST). This heterogeneous network system can simultaneously learn short-term and long-term spatiotemporal features, outperforming other methods.
YY Gesture recognition, which aims to understand meaningful movements of human bodies, plays an essential role in human-computer interaction. The key to gesture recognition is to learn compact and effective spatiotemporal information. However, it remains a challenging task due to the barriers of gesture-irrelevant factors. A number of attempts have been taken to address this problem by cascading deep heterogeneous architectures. However, this cascading strategy cannot capture both local and global spatiotemporal features at each stage of feature learning. In this paper, we propose a novel refined fusion model architecture combining the ResC3D network and Convolutional LSTM (ConvLSTM) with a dynamic select mechanism called Selective Spatiotemporal features learning (SeST). Such a heterogeneous network system is able to simultaneously learn short-term and long-term spatiotemporal features, and they are complementary to each other. The SeST block enables the ResC3D network and ConvLSTM to adaptively adjust their contributions to classification during feature learning with softattention. The method has been evaluated on the three publicly available datasets: the Sheffield Kinect Gesture (SKIG) dataset, the ChaLearn LAP large scale isolated gesture dataset (IsoGD), and the EgoGesture dataset. Experiment results show that the proposed method outperforms other state-of-the-art methods. Besides, our model is an end-to-end model, which can be embedded in many intelligent systems applications.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据