4.6 Article

HandGCNN model for gesture recognition based voice assistance

Journal

MULTIMEDIA TOOLS AND APPLICATIONS
Volume 81, Issue 29, Pages 42353-42369

Publisher

SPRINGER
DOI: 10.1007/s11042-022-13497-5

Keywords

Gesture recognition; Virtual voice; Sign language; Deep learning; Convolution neural network

Ask authors/readers for more resources

In this research, hand gestures are captured and processed using deep learning models to assist individuals who cannot communicate through verbal language. A hand gesture dataset HandG is created using digital cameras and image augmentation, with a novel Convolutional Neural Network (CNN) model called HandGCNN achieving a high prediction accuracy of 99.13%. A real-time system is built using a webcam as the input receptor unit to recognize gestures and generate relevant audio for impaired individuals.
Communication plays an important role in today's world. Before the evolution of the verbal communication, sign language was the only way of communication used by our ancestors. Later on, the verbal communication started evolving and different people from different region started to speak different languages. But there are some groups of people who cannot express themselves with verbal language; instead they use sign language to communicate. To bridge the gap between those people who use sign language for communication with those who use verbal language, a system is designed that recognizes the gestures of the sign language, interprets it and converts it into verbal language. Various researches have been carried out by capturing the hand signs of the speech impaired people through sensors like leap motion sensors and camera. This research works focusses on improving the gesture capturing through camera and process them through deep learning models. This work focussed on creating a hand gesture dataset HandG that includes 20,600 images for 10 classes (2060 images per category) using digital camera and image augmentation. A novel Convolution Neural Networks (CNN) based model, termed as HandGCNN, is proposed achieving a high prediction accuracy of 99.13%. A real-time system with webcam being the input receptor unit is built which recognises the signal and generates the audio relevant to that. The generated audio will serve as voice assistance for impaired people.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available