☆ 4.4 Article

Dynamic gesture recognition based on feature fusion network and variant ConvLSTM

IET IMAGE PROCESSING (2020)

期刊

IET IMAGE PROCESSING

卷 14, 期 11, 页码 2480-2486

出版社

WILEY

DOI: 10.1049/iet-ipr.2019.1248

关键词

learning (artificial intelligence); video signal processing; feature extraction; gesture recognition; image classification; image fusion; recurrent neural nets; convolutional neural nets; image sequences; spatiotemporal phenomena; human computer interaction; variant ConvLSTM; human communication; human-computer interaction; dynamic gesture recognition method; deep learning; spatiotemporal feature extraction; gesture recognition architecture; feature fusion network; local aspects; global aspects; deep aspects; local spatiotemporal feature information; 3D residual network; channel feature fusion; global spatiotemporal information; multifeature fusion depthwise separable network; higher-level features; depth feature information; SKIG dataset; Sheffifield Kinect Gesture dataset; video sequence; gesture feature information; variant convolutional long short-term memory; Jester dataset; classification accuracies

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic Imaging Science & Photographic Technology

资金

National Key Research and Development Program of China [2018YFB1306900]
National Natural Science Foundation of China

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Gesture is a natural form of human communication, and it is of great significance in human-computer interaction. In the dynamic gesture recognition method based on deep learning, the key is to obtain comprehensive gesture feature information. Aiming at the problem of inadequate extraction of spatiotemporal features or loss of feature information in current dynamic gesture recognition, a new gesture recognition architecture is proposed, which combines feature fusion network with variant convolutional long short-term memory (ConvLSTM). The architecture extracts spatiotemporal feature information from local, global and deep aspects, and combines feature fusion to alleviate the loss of feature information. Firstly, local spatiotemporal feature information is extracted from video sequence by 3D residual network based on channel feature fusion. Then the authors use the variant ConvLSTM to learn the global spatiotemporal information of dynamic gesture, and introduce the attention mechanism to change the gate structure of ConvLSTM. Finally, a multi-feature fusion depthwise separable network is used to learn higher-level features including depth feature information. The proposed approach obtains very competitive performance on the Jester dataset with the classification accuracies of 95.59%, achieving state-of-the-art performance with 99.65% accuracy on the SKIG (Sheffifield Kinect Gesture) dataset.

Dynamic gesture recognition based on feature fusion network and variant ConvLSTM

期刊

IET IMAGE PROCESSING

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Dynamic gesture recognition based on feature fusion network and variant ConvLSTM

期刊

IET IMAGE PROCESSING

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文