期刊
PATTERN RECOGNITION
卷 145, 期 -, 页码 -出版社
ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2023.109901
关键词
Hand gesture recognition; Attention mechanism; Multiscale; Feature aggregation; Mask branch
Hand gesture recognition is a challenging task in computer vision, and this paper proposes an end-to-end multiscale feature learning network for improving the effectiveness of hand gesture recognition. The network consists of a CNN-based backbone, a feature aggregation pyramid network, and three task-specific prediction branches. Experimental results show that the proposed method outperforms most state-of-the-art hand gesture recognition methods.
Hand gesture recognition from images is a longstanding computer vision task that can be used to build a potential bridge for human-computer interaction and sign language translation. For number of methods proposed for hand gesture recognition (HGR); however, difficult scenarios such as different scales of hand gestures and complex backgrounds exist, making them less effective. In this paper, we propose an end-to-end multiscale feature learning network for HGR, which consists of a CNN-based backbone network, a feature aggregation pyramid network (FAPN) embedded with a two-stage expansion-squeeze-aggregation (ESA) module, and three taskspecific prediction branches. First, the backbone network extracts multiscale features from the original hand gesture images. Furthermore, the FAPN embedded with two-stage ESA extensively exploits multiscale feature information and learns hand gesture-specific features at different scales. Then, the mask loss guides the network to locate hand-specific regions during the training stage, and finally, the classification and regression branches output the category and location of a hand gesture during the model training and prediction. The experimental results on two publicly available datasets show that the proposed method outperforms most state-of-the-art HGR methods.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据