4.7 Article

Graph-Based CNNs With Self-Supervised Module for 3D Hand Pose Estimation From Monocular RGB

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSVT.2020.3004453

Keywords

Three-dimensional displays; Pose estimation; Two dimensional displays; Feature extraction; Cameras; Convolutional neural networks; Solid modeling; Computer vision; hand pose estimation; graph CNNs; self-supervision

Funding

  1. National Key R&D Program of China [2018AAA0100602]
  2. Fundamental Research Funds for the Central Universities [201964022]
  3. National Natural Science Foundation of China [U1706218, 41927805]
  4. Shandong Provincial Natural Science Foundation, China [ZR2018ZB0852]

Ask authors/readers for more resources

The paper explores the prediction of 3D hand poses from a single RGB image, utilizing multiple feature maps, graph-based convolutional neural networks, and self-supervised modules to improve the accuracy of hand pose estimation.
Hand pose estimation in 3D space from a single RGB image is a highly challenging problem due to self-geometric ambiguities, diverse texture, viewpoints, and self-occlusions. Existing work proves that a network structure with multi-scale resolution subnets, fused in parallel can more effectively shows the spatial accuracy of 2D pose estimation. Nevertheless, the features extracted by traditional convolutional neural networks cannot efficiently express the unique topological structure of hand key points based on discrete and correlated properties. Some applications of hand pose estimation based on traditional convolutional neural networks have demonstrated that the structural similarity between the graph and hand key points can improve the accuracy of the 3D hand pose regression. In this paper, we design and implement an end-to-end network for predicting 3D hand pose from a single RGB image. We first extract multiple feature maps from different resolutions and make parallel feature fusion, and then model a graph-based convolutional neural network module to predict the initial 3D hand key points. Next, we use 2D spatial relationships and 3D geometric knowledge to build a self-supervised module to eliminate domain gaps between 2D and 3D space. Finally, the final 3D hand pose is calculated by averaging the 3D hand poses from the GCN output and the self-supervised module output. We evaluate the proposed method on two challenging benchmark datasets for 3D hand pose estimation. Experimental results show the effectiveness of our proposed method that achieves state-of-the-art performance on the benchmark datasets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available