4.5 Article

ShuffleNetv2-YOLOv3: a real-time recognition method of static sign language based on a lightweight network

期刊

SIGNAL IMAGE AND VIDEO PROCESSING
卷 17, 期 6, 页码 2721-2729

出版社

SPRINGER LONDON LTD
DOI: 10.1007/s11760-023-02489-z

关键词

YOLOv3; Convolutional neural network; Gesture recognition; Object detection; Network lightweight

向作者/读者索取更多资源

Recognizing sign language quickly and accurately on embedded platforms and mobile terminals is important for meeting the communication needs of hearing impaired individuals and the general public. A lightweight model called ShuffleNetv2-YOLOv3 was proposed for static sign language recognition, which achieved a good balance between accuracy and speed. The model utilized ShuffleNetv2 as the backbone network for YOLOv3, significantly improving the recognition speed. The evaluation of the model's performance showed high F1 score and mAP values of 99.1% and 98.4%, respectively. The mobile terminal application of the lightweight model demonstrated improved inference speeds compared to the original YOLOv3 model. Overall, the ShuffleNetv2-YOLOv3 lightweight model lays a solid foundation for real-time gesture recognition on embedded platforms and mobile terminals.
To better meet the communication needs of hearing impaired people and the public, it is of great significance to recognize sign language more quickly and accurately on embedded platforms and mobile terminals. YOLOv3, raised by Joseph Redmon and Ali Farhadi in 2018, achieved a great improvement in detection speed with considerable accuracy by optimizing Yolo. However, YOLOv3 is still too bloated to use on mobile terminals. A static sign language recognition method based on the ShuffleNetv2-YOLOv3 lightweight model was proposed. The ShuffleNetv2-YOLOv3 lightweight model makes the network lightweight by using ShuffleNetv2 as the backbone network of YOLOv3. The lightweight network improved the recognition speed steeply. Combing with the CIoU loss function, the ShuffleNetv2-YOLOv3 keeps the recognition accuracy while improving the recognition speed. Recognition effectiveness of the self-made sign language images and public database by the ShuffleNetv2-YOLOv3 lightweight model was evaluated by F1 score and mAP value. The performance of the ShuffleNetv2-YOLOv3 model was compared with that of the YOLOv3-tiny, SSD, Faster-RCNN, and YOLOv4-tiny model, respectively. The experimental results show that the proposed ShuffleNetv2-YOLOv3 model achieved a good balance between the accuracy and speed of the gesture detection under the premise of model lightweight. The F1 score and mAP value of the ShuffleNetv2-YOLOv3 model were 99.1% and 98.4%, respectively. The gesture detection speed on the GPU reaches 54 frames per second, which is better than other models. The mobile terminal application of the proposed lightweight model was also evaluated. The minimal inference speed of single frame images on the CPU and GPU is 0.14 and 0.025 s per image, respectively. It is only 1/6.5 and 1/8.5 of the running speed of the original YOLOv3 model. The ShuffleNetv2-YOLOv3 lightweight model is conducive to quick, real time, and similar static sign language gesture recognition, laying a good foundation for real-time gesture recognition in the embedded platforms and mobile terminals.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据