4.7 Article

Latent Distribution-Based 3D Hand Pose Estimation From Monocular RGB Images

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSVT.2021.3055862

Keywords

Three-dimensional displays; Pose estimation; Two dimensional displays; Heating systems; Cameras; Neural networks; Feature extraction; 3D hand pose estimation; latent representation

Funding

  1. Fundamental Research Funds for the Central Universities [2019kfyXKJC024]

Ask authors/readers for more resources

A novel compressed latent distribution representation is proposed to address the channel correspondence problem in 3D hand pose estimation from monocular RGB images. By interconnecting 2D and depth feature maps more directly, the proposed method effectively improves cross-dataset performance and achieves state-of-the-art results on benchmark datasets.
In this article, we propose a novel compressed latent distribution representation for 3D hand pose estimation from monocular RGB images to alleviate the channel correspondence problem. The channel correspondence problem occurs when the 2D and depth coordinates are estimated from independent feature maps, which means the 2D and depth channel sequences may not match during the cross-dataset inference. In contrast, we propose a compressed latent distribution representation that the 2D and depth feature maps for each joint are interconnected and inter-constrained more directly, effectively alleviating the channel correspondence problem and improving cross-dataset performance. Moreover, we design an efficient encoder-decoder network that can maintain the resolution of feature maps to enable better hand feature extraction from monocular RGB images. In this work, the overall pipeline contains two branches: one is the 2D hand pose estimation branch based on a latent heatmap representation (LHR); the other is the 3D hand pose estimation branch based on our proposed latent distribution representation (LDR). In this way, the 2D estimation branch serves as guidance for the 3D branch, which simplifies the optimization of the overall network and results in a more rapid convergence during training. The results on several benchmark datasets (including STB, RHD, and the most recently released InterHand2.6M) demonstrate that our proposed method achieves state-of-the-art (SOTA) performance.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available