4.7 Article

A Local Correspondence-Aware Hybrid CNN-GCN Model for Single-Image Human Body Reconstruction

Journal

IEEE TRANSACTIONS ON MULTIMEDIA
Volume 25, Issue -, Pages 4679-4690

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2022.3180218

Keywords

3D human reconstruction; monocular image; convolutional neural networks; graph convolutional neural networks; SMPL

Ask authors/readers for more resources

This study proposes a hybrid model to estimate a 3D human body mesh by combining model-based and model-free approaches. It first utilizes a convolutional neural network (CNN) to generate a coarse human mesh, and then refines the mesh's vertex coordinates using a graph convolutional neural network (GCN). Experimental results demonstrate that this hybrid model outperforms existing methods in human reconstruction.
Reconstructing a 3D human body mesh from a monocular image is a challenging inverse problem because of occlusion and complicated human articulations. Recent deep learning-based methods have made significant progress in single-image human reconstruction. Most of these works are either model-based methods or model-free methods. However, model-based methods always suffer detail losses due to the limited parameter space, and model-free methods are hard to directly recover satisfactory results from images due to the use of a shared global feature for all vertices and the domain gap between 2D regular images and 3D irregular meshes. To resolve these issues, we propose a hybrid model, which combines the advantages of both model based approach and model-free approach to estimate a 3D human mesh in a coarse-to fine manner. Initially, we utilize a convolutional neural network (CNN) to estimate the parameters of a Skinned Multi-Person Linear Model (SMPL), which allows us to generate a coarse human mesh. After that, the vertex coordinates of the coarse human mesh are further refined by a graph convolutional neural network (GCN). Unlike previous GCN-based methods, whose vertex coordinates are recovered from a shared global feature, we propose a LOcal CorRespondence-Aware (LOCRA) module to extract local special features for each vertex. To make the local features related to the human pose, we also add a keypoint-related loss to supervise the training process of the LOCRA module. Experiments demonstrate that our hybrid model with the LOCRA module outperforms existing methods on multiple public benchmarks.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available