☆ 4.6 Article

ZS-GR: zero-shot gesture recognition from RGB-D videos

MULTIMEDIA TOOLS AND APPLICATIONS (2023)

期刊

MULTIMEDIA TOOLS AND APPLICATIONS

卷 82, 期 28, 页码 43781-43796

出版社

SPRINGER

DOI: 10.1007/s11042-023-15112-7

关键词

Gesture recognition; Zero-shot learning; Vision transformer; Lingual embedding; BERT; Action recognition

类别

Computer Science, Information Systems Computer Science, Software Engineering Computer Science, Theory & Methods Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Gesture Recognition is a challenging research area in computer vision. To address the annotation bottleneck, the problem of Zero-Shot Gesture Recognition is formulated and a two-stream model is proposed. By leveraging Vision Transformer models for human detection and visual features representation, state-of-the-art results are achieved.

Gesture Recognition (GR) is a challenging research area in computer vision. To tackle the annotation bottleneck in GR, we formulate the problem of Zero-Shot Gesture Recognition (ZS-GR) and propose a two-stream model from two input modalities: RGB and Depth videos. To benefit from the vision Transformer capabilities, we use two vision Transformer models, for human detection and visual features representation. We configure a transformer encoder-decoder architecture, as a fast and accurate human detection model, to overcome the challenges of the current human detection models. Considering the human keypoints, the detected human body is segmented into nine parts. A spatio-temporal representation from human body is obtained using a vision Transformer and a LSTM network. A semantic space maps the visual features to the lingual embedding of the class labels via a Bidirectional Encoder Representations from Transformers (BERT) model. We evaluated the proposed model on five datasets, Montalbano II, MSR Daily Activity 3D, CAD-60, NTU-60, and isoGD obtaining state-of-the-art results compared to state-of-the-art ZS-GR models as well as the Zero-Shot Action Recognition (ZS-AR).

ZS-GR: zero-shot gesture recognition from RGB-D videos

期刊

MULTIMEDIA TOOLS AND APPLICATIONS

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

ZS-GR: zero-shot gesture recognition from RGB-D videos

期刊

MULTIMEDIA TOOLS AND APPLICATIONS

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文