4.6 Article

HandGCNN model for gesture recognition based voice assistance

期刊

MULTIMEDIA TOOLS AND APPLICATIONS
卷 81, 期 29, 页码 42353-42369

出版社

SPRINGER
DOI: 10.1007/s11042-022-13497-5

关键词

Gesture recognition; Virtual voice; Sign language; Deep learning; Convolution neural network

向作者/读者索取更多资源

In this research, hand gestures are captured and processed using deep learning models to assist individuals who cannot communicate through verbal language. A hand gesture dataset HandG is created using digital cameras and image augmentation, with a novel Convolutional Neural Network (CNN) model called HandGCNN achieving a high prediction accuracy of 99.13%. A real-time system is built using a webcam as the input receptor unit to recognize gestures and generate relevant audio for impaired individuals.
Communication plays an important role in today's world. Before the evolution of the verbal communication, sign language was the only way of communication used by our ancestors. Later on, the verbal communication started evolving and different people from different region started to speak different languages. But there are some groups of people who cannot express themselves with verbal language; instead they use sign language to communicate. To bridge the gap between those people who use sign language for communication with those who use verbal language, a system is designed that recognizes the gestures of the sign language, interprets it and converts it into verbal language. Various researches have been carried out by capturing the hand signs of the speech impaired people through sensors like leap motion sensors and camera. This research works focusses on improving the gesture capturing through camera and process them through deep learning models. This work focussed on creating a hand gesture dataset HandG that includes 20,600 images for 10 classes (2060 images per category) using digital camera and image augmentation. A novel Convolution Neural Networks (CNN) based model, termed as HandGCNN, is proposed achieving a high prediction accuracy of 99.13%. A real-time system with webcam being the input receptor unit is built which recognises the signal and generates the audio relevant to that. The generated audio will serve as voice assistance for impaired people.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据