☆ 4.7 Article

Latent Distribution-Based 3D Hand Pose Estimation From Monocular RGB Images

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2021)

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Volume 31, Issue 12, Pages 4883-4894

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCSVT.2021.3055862

Keywords

Three-dimensional displays; Pose estimation; Two dimensional displays; Heating systems; Cameras; Neural networks; Feature extraction; 3D hand pose estimation; latent representation

Funding

Fundamental Research Funds for the Central Universities [2019kfyXKJC024]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

A novel compressed latent distribution representation is proposed to address the channel correspondence problem in 3D hand pose estimation from monocular RGB images. By interconnecting 2D and depth feature maps more directly, the proposed method effectively improves cross-dataset performance and achieves state-of-the-art results on benchmark datasets.

In this article, we propose a novel compressed latent distribution representation for 3D hand pose estimation from monocular RGB images to alleviate the channel correspondence problem. The channel correspondence problem occurs when the 2D and depth coordinates are estimated from independent feature maps, which means the 2D and depth channel sequences may not match during the cross-dataset inference. In contrast, we propose a compressed latent distribution representation that the 2D and depth feature maps for each joint are interconnected and inter-constrained more directly, effectively alleviating the channel correspondence problem and improving cross-dataset performance. Moreover, we design an efficient encoder-decoder network that can maintain the resolution of feature maps to enable better hand feature extraction from monocular RGB images. In this work, the overall pipeline contains two branches: one is the 2D hand pose estimation branch based on a latent heatmap representation (LHR); the other is the 3D hand pose estimation branch based on our proposed latent distribution representation (LDR). In this way, the 2D estimation branch serves as guidance for the 3D branch, which simplifies the optimization of the overall network and results in a more rapid convergence during training. The results on several benchmark datasets (including STB, RHD, and the most recently released InterHand2.6M) demonstrate that our proposed method achieves state-of-the-art (SOTA) performance.

Latent Distribution-Based 3D Hand Pose Estimation From Monocular RGB Images

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Latent Distribution-Based 3D Hand Pose Estimation From Monocular RGB Images

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper