☆ 4.7 Article

MultiD-CNN: A multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences

EXPERT SYSTEMS WITH APPLICATIONS (2020)

期刊

EXPERT SYSTEMS WITH APPLICATIONS

卷 139, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.eswa.2019.112829

关键词

Gesture recognition; Deep learning; Convolutional neural networks; Multimodal learning; Feature fusion; RGB-D video processing

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic Operations Research & Management Science

资金

Centre National pour la Recherche Scientifique et Technique (CNRST) - Moroccan government [14UIZ2015]
PPR2-2015 project

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Human gesture recognition has become a pillar of today's intelligent Human-Computer Interfaces as it typically provides more comfortable and ubiquitous interaction. Such expert system has a promising prospect in various applications, including smart houses, gaming, healthcare, and robotics. However, recognizing human gestures in videos is one of the most challenging topics in computer vision, because of some irrelevant environmental factors like complex background, occlusion, lighting conditions, and so on. With the recent development of deep learning, many researchers have addressed this problem by building single deep networks to learn spatiotemporal features from video data. However, the performance is still unsatisfactory due to the limitation that the single deep networks are incapable of handling these challenges simultaneously. Hence, the extracted features cannot efficiently capture both relevant shape information and detailed spatiotemporal variation of the gestures. One solution to overcome the aforementioned drawbacks is to fuse multiple features from different models learned on multiple vision cues. Aiming at this objective, we present in this paper an effective multi-dimensional feature learning approach, termed as MultiD-CNN, for human gesture recognition in RGB-D videos. The key to our design is to learn high-level gesture representations by taking advantages from Convolutional Residual Networks (ResNets) for training extremely deep models and Convolutional Long Short-Term Memory Networks (ConvLSTM) for dealing with time-series connections. More specifically, we first construct an architecture to simultaneously learn the spatiotemporal features from RGB and depth sequences through 3D ResNets which are then linked to a ConvLSTM to capture the temporal dependencies between them, and we show that they better combine appearance and motion information effectively. Second, to alleviate distractions from background and other variations, we propose a method that encodes the temporal information into a motion representation, while a two-stream architecture based on 2D-ResNets is then employed to extract deep features from this representation. Third, we investigate different fusion strategies at different levels for blending the classification results, and we show that integrating multiple ways of encoding the spatial and temporal information leads to a robust and stable spatiotemporal feature learning with better generalization capability. Finally, we perform different experiments to evaluate the performance of the investigated architectures on four kinds of challenging datasets, demonstrating that our approach is particularly impressive where it outperforms prior arts in both accuracy and efficiency. The obtained results affirm also the importance of embedding the proposed approach in other intelligent systems application areas. (C) 2019 Elsevier Ltd. All rights reserved.

MultiD-CNN: A multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences

期刊

EXPERT SYSTEMS WITH APPLICATIONS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

MultiD-CNN: A multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences

期刊

EXPERT SYSTEMS WITH APPLICATIONS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文