☆ 4.5 Article

Self-supervised method for 3D human pose estimation with consistent shape and viewpoint factorization

APPLIED INTELLIGENCE (2023)

期刊

APPLIED INTELLIGENCE

卷 53, 期 4, 页码 3864-3876

出版社

SPRINGER

DOI: 10.1007/s10489-022-03714-x

关键词

3D pose estimation; Self-supervised learning; Consistent factorization; Hierarchical dictionary

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article introduces a self-supervised method that can train a 3D human pose estimation network without any extra 3D pose annotations. By fully disentangling the camera viewpoint information and 3D human shape, the method overcomes the projection ambiguity problem and utilizes a hierarchical dictionary for stable canonical reconstruction.

3D human pose estimation from monocular images has shown great success due to the sophisticated deep network architectures and large 3D human pose datasets. However, it is still an open problem when such datasets are unavailable. Estimating 3D human poses from monocular images is an ill-posed inverse problem. In our work, we propose a novel self-supervised method, which effectively trains a 3D human pose estimation network without any extra 3D pose annotations. Different from the commonly used GAN-based technique, our method overcomes the projection ambiguity problem by fully disentangling the camera viewpoint information from the 3D human shape. Specifically, we design a factorization network to predict the coefficients of canonical 3D human pose and camera viewpoint in two separate channels. Here, we represent the canonical 3D human pose as a combination of pose basis from a dictionary. To guarantee consistent factorization, we design a simple yet effective loss function taking advantage of multi-view information. Besides, in order to generate robust canonical reconstruction from the 3D pose coefficient, we exploit the underlying 3D geometry of human poses to learn a novel hierarchical dictionary from 2D poses. The hierarchical dictionary has stronger 3D pose expressibility than the traditional single-level dictionary. We comprehensively evaluate the proposed method on two public 3D human pose datasets, Human3.6M and MPI-INF-3DHP. The experimental results show that our method can maximally disentangle 3D human shapes and camera viewpoints, as well as reconstruct 3D human poses accurately. Moreover, our method achieves state-of-the-art results compared with recent weakly/self-supervised methods.

Self-supervised method for 3D human pose estimation with consistent shape and viewpoint factorization

期刊

APPLIED INTELLIGENCE

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Self-supervised method for 3D human pose estimation with consistent shape and viewpoint factorization

期刊

APPLIED INTELLIGENCE

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文