4.7 Article

Differentiable Spatial Regression: A Novel Method for 3D Hand Pose Estimation

Journal

IEEE TRANSACTIONS ON MULTIMEDIA
Volume 24, Issue -, Pages 166-176

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2020.3047552

Keywords

Three-dimensional displays; Pose estimation; Decoding; Two dimensional displays; Neural networks; Cameras; Space heating; Convolutional neural networks; deep learning; hand pose estimation; image recognition

Funding

  1. National Natural Science Foundation of China [62073097]
  2. Natural Science Foundation of Heilongjiang Province of China [LC2017022]
  3. Postdoctoral Scientific Research Developmental Fund of Heilongjiang Province of China [LBH-Q17071]

Ask authors/readers for more resources

This paper proposes a novel Differentiable Spatial Regression method for 3D hand pose estimation, which combines the advantages of regression-based and detection-based methods. A specific model named SRNet is designed, utilizing a combination of 2D heatmaps and local offset maps to improve the accuracy and effectiveness of the estimation.
3D Hand pose estimation from a single depth image is an essential topic in computer vision and human-computer interaction. Although the rising of deep learning boosts the accuracy a lot, the problem is still hard to solve due to the complex structure of the human hand. Two existing types of methods with deep learning, i.e. the regression-based and detection-based methods, either lose spatial information of the hand structure or lack direct supervision of the joint coordinates. In this paper, we propose a novel Differentiable Spatial Regression method which combines the advantages of these two types of methods to overcome each other's shortcomings. Our method uses spatial-form representation (SFR) to maintain spatial information and differentiable decoder to establish a direct supervision. Following the procedure suggested by our method, a particular model named SRNet is designed which uses a combination of 2D heatmaps and local offset maps as SFRs. Two modules named Plane Regression and Depth Regression are designed as differentiable decoder to regress plane coordinates and depth coordinates respectively. Ablation study demonstrates the superiority of our method over the two combined methods since the differentiable decoder leads to better SFRs learned by the network itself other than human design. Extensive experiments on four public datasets demonstrate that SRNet is comparable with the state-of-the-art models.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available