4.8 Article

Recurrent 3D Hand Pose Estimation Using Cascaded Pose-Guided 3D Alignments

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2022.3159725

Keywords

Hand pose estimation; alignment; cascaded neural networks; recurrent model

Ask authors/readers for more resources

This paper investigates the impact of view-independent features on 3D hand pose estimation from a single depth image, and proposes a novel recurrent neural network model for 3D hand pose estimation. The model uses a cascaded 3D pose-guided alignment strategy for view-independent feature extraction and a recurrent hand pose module for modeling the dependencies among sequential aligned features. Experiments show that this method significantly improves the state-of-the-art accuracy on popular benchmarks with simple yet efficient alignment and network architectures.
3D hand pose estimation is a challenging problem in computer vision due to the high degrees-of-freedom of hand articulated motion space and large viewpoint variation. As a consequence, similar poses observed from multiple views can be dramatically different. In order to deal with this issue, view-independent features are required to achieve state-of-the-art performance. In this paper, we investigate the impact of view-independent features on 3D hand pose estimation from a single depth image, and propose a novel recurrent neural network for 3D hand pose estimation, in which a cascaded 3D pose-guided alignment strategy is designed for view-independent feature extraction and a recurrent hand pose module is designed for modeling the dependencies among sequential aligned features for 3D hand pose estimation. In particular, our cascaded pose-guided 3D alignments are performed in 3D space in a coarse-to-fine fashion. First, hand joints are predicted and globally transformed into a canonical reference frame; Second, the palm of the hand is detected and aligned; Third, local transformations are applied to the fingers to refine the final predictions. The proposed recurrent hand pose module for aligned 3D representation can extract recurrent pose-aware features and iteratively refines the estimated hand pose. Our recurrent module could be utilized for both single-view estimation and sequence-based estimation with 3D hand pose tracking. Experiments show that our method improves the state-of-the-art by a large margin on popular benchmarks with the simple yet efficient alignment and network architectures.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available