Parametric Model-Based 3D Human Shape and Pose Estimation from Multiple Views

Li, Zhongguo; Heyden, Anders; Oskarsson, Magnus

doi:10.1007/978-3-030-20205-7_28

Parametric Model-Based 3D Human Shape and Pose Estimation from Multiple Views

Zhongguo Li¹²,
Anders Heyden¹² &
Magnus Oskarsson¹²

Conference paper
First Online: 12 May 2019

2242 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11482))

Abstract

Human body pose and shape estimation is an important and challenging task in computer vision. This paper presents a novel method for estimating 3D human body pose and shape from several RGB images, using detected joint positions in the images and based on a parametric human body model. Firstly, the 2D joint points of the RGB images are estimated using a deep neural network, which provides a strong prior on the pose. Then, an energy function is constructed based on the 2D joint points in the RGB images and a parametric human body model. By minimizing the energy function, the pose, shape and camera parameters are obtained. The main contribution of the method over previous work, is that the optimization is based on several images simultaneously using only estimated joint positions in the images. We have performed experiments on both synthetic and real image data-sets, that demonstrate that our method can reconstruct 3D human bodies with better accuracy than previous single view methods.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

A 3D model of the human body is required in many applications, such as video games, e-commerce, virtual reality, biomedical research, etc. It is therefore important to have robust and accurate methods for recovering models of humans from one or several RGB-images. This is however a difficult problem, due to non-rigid motion, different clothing and complex articulation. This makes 3D body reconstruction a very challenging and interesting task in computer vision.

Aiming at effectively acquiring a realistic and personalized 3D human body, several methods have been proposed during the past decades, many using expensive active reconstruction equipment or improving the performance of reconstruction algorithms based on structure from motion methods. Using 3D scanners or multiple calibrated cameras in a controlled environment can obtain 3D models with very high accuracy [1]. The disadvantage of such methods is that these systems are very expensive and relatively complicated to build.

Besides these scanning systems, another line of research is to obtain 3D models from images acquired by ordinary cameras or depth sensors by stereo reconstruction algorithms or fusion algorithms [5,6,7]. These methods do not require expensive equipment or complicated set-ups, but are instead based on computationally expensive computer vision algorithms. Structure from motion (SfM) can reconstruct the 3D model of a person with static pose from a moving camera. Using depth sensors, for instance the Kinect, one can also obtain a 3D model through fusion of the geometries obtained from different view-points. These methods do not require any prior information, such as human shape.

Although these ideas have achieved a lot of progress on effectively obtaining 3D reconstructions, there is still a need for simpler methods to reconstruct 3D human body model. With the remarkable progress of human pose estimation based on deep neural networks (DNN), poses have been shown to provide useful information for the reconstruction. Therefore, other methods based on strong prior information are proposed to reconstruct 3D models and are shown to have good performance. These methods can estimate a 3D human body model from one monocular RGB image by fitting the statistical human body model to the human pose predicted by a DNN [22, 23]. However, only one image is not sufficient in order to provide sufficient accurate 3D reconstruction in many cases, due to self-occlusion and complicated articulated motion.

In this paper we propose to use several (e.g. a sequence of) RGB images which are acquired from different viewpoints to reconstruct the 3D human body based on a skinned multi-person linear shape model (SMPL) [2]. We construct an energy function which measures the difference between the 2D joint points of the RGB images and the 2D joint points of the projected SMPL model. The 2D joint points of the RGB images are predicted by OpenPose [3]. The difference between our method and SfM-based methods is that we only use the estimated joint positions to reconstruct the 3D model. At the same time, the camera orientations are also regarded as parameters when the energy function is minimized. The advantage is that several images from different viewpoints can provide more accurate 3D information and the number of the images used in our method is in general fewer compared with SfM-based methods. Experiments on synthetic data and Human3.6M [4] show that our method obtains more accurate pose estimation and 3D shape, than similar methods based on a single image.

2 Related Work

As shown in [25], related work is basically divided into two categories: methods that do not use parametric models and methods based on parametric models.

Non-parametric model based methods typically reconstruct 3D models from images acquired by a camera from different viewpoints or from the fusion of depth sensors. The results of the methods can be obtained accurately without using any strong prior information. However, the person should stand still to capture the data and the computation is quite complex and time-consuming. The most well-known algorithm is KinectFusion [5] which creates 3D models in real time by incrementally fusing the partial scans from a moving RGB-D-sensor. It has good performance for rigid objects, but is not designed for articulated motion. Therefore, for the 3D reconstruction of a static person, some approaches [6, 7] inspired by KinectFusion are proposed. These methods cannot achieve satisfying result for the dynamic person since the human body typically is moving non-rigidly between different views. DynamicFusion [8], which is the pioneering work for the reconstruction of non-rigid objects, can reconstruct the 3D geometry in real time for a slowly moving person. Other methods such as KillingFusion [9] and BodyFusion [12] are proposed to improve the results based on DynamicFusion. However, these approaches are only suitable for slow motion and have high computational complexity. In order to obtain more accurate results, multiple Kinect sensors or several calibrated cameras can be utilized to create 3D human body models. In [10], the authors propose to use eight Kinects to obtain the 3D model with high accuracy. Multiple cameras are also used in [1, 11] to reconstruct the 3D human body. However, there are technical challenges and it is expensive to build a system with eight Kinects or to build the indoor environment like [1] for many practical applications.

Parametric model-based methods often rely on a template which provides strong prior information during the reconstruction. The template can be reconstructed from depth data or using a pre-computed human body model. In [13,14,15], a novel non-rigid registration algorithm is proposed to register a pre-scanned model to other partial depth data acquired by Kinect. In [14], a template is obtained through registering several high quality partial scans and then a personalized 3D model is reconstructed by fitting to the template. Some other algorithms [16, 17] have similar ideas but they use more complicated information or hardware to improve the accuracy and efficiency. Besides pre-scanning the template, a number of statistical human body models have been proposed based on training of a human body set, such as SCAPE [18], SMPL [2] and so on [19]. In [20] the authors use the SCAPE model to fit the depth image to obtain a 3D model. The improved SCAPE model, Delta, is proposed in [21] and a detailed body reconstruction algorithm is presented in this paper. In [22], the authors propose to fit the SMPL model by using 2D joint points predicted by a DNN-based method. Huang et al. [23] use a similar idea but they focus on the video problem, using temporal information. In [24] an end-to-end adversarial learning method is used to estimate the human pose and shape parameters by fitting the SMPL model. Alldieck et al. [25] propose an algorithm to obtain the consensus shape and then use both pose and consensus shape to fit the SMPL model in order to obtain better result.

3 Method

Our aim is to obtain the 3D model of a human body from several RGB images taken from different view-points. Our approach is inspired by the work in [22] where the 3D human body model is estimated from only one RGB image. Although the method in [22] has achieved some accuracy, the error is still noticeable in many cases since one RGB image cannot supply enough information. As an improvement, we propose to use several RGB images taken from different view-points to reconstruct the 3D model. This leads to a more challenging optimization problem, since the motion of the cameras is unknown, and we need to introduce the parameters of the cameras as variables to estimate. Firstly, we estimate the positions of the 2D joint points of the person in the images by using OpenPose. Then, the SMPL model is fitted to the pose of the person in different views by optimizing an energy function in which the camera parameters are included. Finally, the pose, shape and the camera parameters are estimated to obtain the 3D model of the human. The pipeline of our method is summarized in Fig. 1. In the following, we firstly introduce SMPL model, then the energy function and finally the optimization that gives the estimation of the camera parameters as well as the pose and shape parameters of the 3D human model.

3.1 SMPL Model

The SMPL model encodes both pose and shape parameters [22]. The pose is defined from the parameter $\varvec{\theta }$, which represents the relative rotations of the 23 joint points with respect to the root joint. The shape is represented by the parameter $\varvec{\beta }$, which describes the strength of each mode in a shape space obtained from a principal component analysis (PCA) from a registered training set. The pose parameters are represented as a vector $\varvec{\theta }\in {\mathbb {R}^{72}}$ and the shape parameters as a vector $\varvec{\beta }\in {\mathbb {R}^{10}}$.

The output of the SMPL model after introducing pose and shape is a mesh with $N=6890$ vertices and $F=13776$ faces, $M(\varvec{\theta },\varvec{\beta })\in \mathbb {R}^{N\times 3}$. In this model, the 3D joints are obtained by linear regression from the surface mesh vertices, i.e., a function of the pose and shape coefficients. Therefore, the pose and shape parameters can be estimated by optimizing an energy function based on the joint points.

3.2 Pose and Shape Fitting

The approach in [22] is called SMPLify, in which the projection of the 3D joints of the SMPL model is fitted to the 2D joint points predicted by a CNN-based method. The advantage of this method is that only one image is utilized to obtain the 3D model. However, one disadvantage of SMPLify is that in some situations one image does not contain enough information for obtaining an accurate 3D reconstruction (due to self-occlusion, articulated motion and ambiguous pose). Other methods based on traditional SfM pipelines, require a lot of images from different views and are computationally intensive. Therefore, we propose to use several images from different views into SMPLify because more images will provide more regularization and it is convenient to not use too many images. The problem of this idea is that the parameters of the cameras from different views are unknown, which makes the projection of the joint points of the SMPL model difficult. The solution to this problem is to use the parameters of the cameras together with the pose and shape of the SMPL model as the variables of an energy function during the optimization. The advantage of this method is that we can obtain not only an estimate of the pose and shape but also an estimate of the cameras parameters (position and orientation).

The energy function contains three parts: the pose-fitting term, the shape parameter regularization term and the pose parameter regularization term. We define the energy function as:

$$\begin{aligned} E(\varvec{\theta },\varvec{\beta },R_i) = E_{J}(\varvec{\theta },\varvec{\beta },R_i)+\lambda _{\theta }E_{\varvec{\theta }}(\varvec{\theta })+\lambda _{\beta }E_{\varvec{\beta }}(\varvec{\beta }), \end{aligned}$$

(1)

where $E_{J}(\varvec{\theta },\varvec{\beta },R_i)$ is the pose fitting term, $E_{\varvec{\theta }}(\varvec{\theta })$ is the pose parameters regularization term, $E_{\varvec{\beta }}(\varvec{\beta })$ is the shape parameters regularization term and $\lambda _{\theta }$ and $\lambda _{\beta }$ are weights. In the energy function, the pose $\varvec{\theta }$, the shape $\varvec{\beta }$ and the rotation of the camera $R_i$ can be estimated through

$$\begin{aligned} \hat{\varvec{\theta }},\hat{\varvec{\beta }},\hat{R_i}=arg\min {E(\varvec{\theta },\varvec{\beta },R_i)}. \end{aligned}$$

(2)

The most important term is $E_{J}$ in our method and it is defined as

$$\begin{aligned} E_{J}(\varvec{\theta },\varvec{\beta },R_i)=\sum _{i=1}^{N}\sum _{k=1}^{K}\rho (\varPi _i(J_{S,k})-J_{2d,k}^{(i)}), \end{aligned}$$

(3)

where N is the number of images, K is the number of joint points, $J_{S,k}$ is the k-th 3D joint points of the SMPL model, $\varPi _i$ is the i-th camera, $J_{2d,k}^{(i)}$ is the k-th 2D joint points estimated by OpenPose for the i-th image and $R_i$ is the rotation for i-th camera. The error $\rho $ is measured by the Geman-McClure function [26] which gives robustness to large noise and outliers. This function is defined as

$$\begin{aligned} \rho (x)=\frac{x^2}{\sigma ^2+x^2}, \end{aligned}$$

(4)

where x is the absolute errors of 2D joint points and $\sigma $ is a constant. The projection of the 3D joint points of the SMPL model in the i-th camera is

$$\varPi _i(J_{S})=R_iJ_{S}+t_i,$$

where $t_i$ is the translation of the i-th camera. The translation is calculated separately using the shoulders and hips, which implies that we can assume that the person is standing parallel to the image plane. Because the projection is linear, the derivatives of the error function can be computed easily during the optimization.

The pose regularization is needed for avoiding the knees and elbows bending unnaturally and it is defined as

$$\begin{aligned} E_{\theta }(\varvec{\theta })=\alpha \sum \limits _i{exp(\theta _i)}, \end{aligned}$$

(5)

where $\varvec{\theta }_i$ denotes the pose of the joint points of elbows and knees and $\alpha $ is a constant that controls the penalization. The shape regularization term is defined as

$$\begin{aligned} E_{\beta }(\varvec{\beta })=\sum {\beta _i}, \end{aligned}$$

(6)

i.e. as the sum of the elements of $\varvec{\beta }$.

3.3 Optimization

The optimization is performed in two steps. In the first step the camera translation is estimated. Here the focal length of the camera is assumed to be known. The camera translation can be estimated through fitting the shoulders and hips in the SMPL model and the predicted 2D pose.

In the second step the model is fitted through minimization of (3). The parameters $\lambda _{\theta }$ and $\lambda _{\beta }$ are decreased gradually during the optimization. The minimization method is based on Powell’s dogleg method, which is provided by the python module OpenDR [27] and Chumpy [28]. For four different views (image size $320\times 240$), it takes about 2 min for the minimization on a desktop machine.

4 Experiment

In this section some experiments are presented to illustrate the performance of our method. In the first experiment, we generate a small synthetic dataset based on SURREAL [29] in which a large amount of synthetic human bodies with different poses and shapes are created based on the SMPL model. Since the SURREAL only provides videos from one view, we generate three more images from the other views. Then, for the real images, our method is evaluated on the Human3.6M which is for the evaluation of human pose estimation.

In our experiments, the parameters $(\lambda _{\theta },\lambda _{\beta })$ decrease as (404, 100), (404, 50), (58, 5), (4.78, 1). The $\sigma $ is set to 100 and $\alpha $ is 10. The maximum number of iterations is 100 for every stage and the stopping criteria is that the error of the energy function is smaller than $10^{-3}$. The experiments are implemented in Python and our desktop machine has a 4 core Intel i5-6500 CPU @ 3.20 GHz with 8 GB RAM.

4.1 Results on Synthetic Data

The synthetic images are generated based on SURREAL. We utilize 100 pose and shape parameters from the training information data of SURREAL into the SMPL model to generate 100 different 3D human bodies. Then, four images whose sizes are $320\times 240$ are projected by cameras from different view-points for each human body model. At the same time, the joint points of the human body in each image are also computed. (We will provide this small dataset online). In our method the 3D models are estimated from two, three and four views respectively. As comparison, the 3D models obtained by SMPLify are also given. In order to quantitatively compare the results, the metric for evaluation is defined as:

$$\begin{aligned} Error=\frac{1}{N}\sum _{i=1}^{N}||J_{i}^{gt}-J_{i}^{est}||_2, \end{aligned}$$

(7)

where $J_{i}^{gt}$ is the ground truth of the 3D joint points and $J_{i}^{est}$ is the estimated 3D joint points. In this part, there are a total of 24 joint points for the SMPL model.

Table 1. The mean errors of SMPLify compared with our method using respectively 2, 3 and 4 images.

Full size table

The errors for the 100 samples and the mean error using different number of images are given in Fig. 2. It is shown that the error is smaller when more images are used in the method for most samples. In some cases, the error of our method with two or three images is greater. This is because images from two different views may influence each others since the camera at the side position cannot capture all of the joint points. The mean error of the 100 samples is also given in Table 1. We can see that the mean error decreases when more images are utilized and that the performance of our method surmounts that of SMPLify, which shows that more images indeed can provide more useful information. In Fig. 3 some images are given as qualitative results. Each row corresponds to one person with some unknown shape and pose. For each row the left hand image in each column is the input image for frame one to four. The middle image in column one is the result from SMPLify using only the input image from frame one. The right hand image in each column is the result of our method using all four input frames. We can see from the first image that our method has better performance, especially the last one. The images from other views show that the estimation of the cameras by our method is very correct, which demonstrates the effectiveness of our method.

4.2 Results on Human3.6M

There are total of 11 subjects (6 males, 5 females) in Human3.6M and every person has 15 actions. In order to test our method sufficiently, we choose S1 which is a female and S6 which is a male to evaluate our method on 8 actions: Directions, Discussion, Eating, Greeting, Phoning, Posing, Purchasing and Sitting. For each action, we sample the video every five frames and take total of 100 frames. The results of SMPLify and our method with four images are compared. The metric for the comparison is also computed according to Eq. (7). In this part there are 16 joint points because the number of joints in Human3.6M is 16. Similarly, the errors of every frame in the different actions for S1 and S6 are shown in Figs. 4 and 5. The mean errors of the 100 frames in each action for S1 and S2 are shown in Tables 2 and 3. It is shown in these results that our method can obtain more accurate estimation in most cases.

In addition, some images from the dataset are shown in Fig. 6. We can see from Fig. 6 that SMPLify has obvious errors such as the occlusion of the arms and bodies. Our method sometimes also have unexpected errors because of having a side-view such as such as the fifth sample. The reason is that our method relies on all of the observed images. Therefore, if one camera translation is not estimated correctly, it will affect the images from other views and then the final results may be incorrect after the optimization. However, SMPLify only uses one image and if this image is not captured from the side view, the result is sometimes better than ours. In general, our method can achieve better estimation than SMPLify.

Table 2. The mean errors of the eight actions of S1.

Full size table

Table 3. The mean errors of the eight actions of S6.

Full size table

5 Conclusion

We have proposed a method to reconstruction a 3D human body model from several RGB images taken from different view-points. Our approach starts by estimating the 2D joint points of the images by using a DNN-based method called OpenPose. Then, a statistical human body model, SMPL, is utilized to fit the predicted 2D joint points from the images by minimizing an energy function over all images simultaneously. Finally, our method estimates both the pose and shape parameters of the human body as well as the camera parameters. Experiments on synthetic and real data quantitatively and qualitatively demonstrate that the results of our method are better regarding the pose error compared to the previous method based on only one image.

Our method also has some limitation. If the images are captured from the side view, the joint points will be very close to each other or even at the same position, which makes our method unstable. Also, we mainly focus on the estimation of the pose and this implies that the shape of the reconstruction is less accurate. However, this is a fundamental limitation of all methods that only use the joint positions and disregard the contours of the body.

References

Joo, H., et al.: Panoptic studio: a massively multiview system for social interaction capture. IEEE Trans. Pattern Anal. Mach. Intell. 44(1), 190–204 (2017)
Article Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248:1–248:16 (2015)
Article Google Scholar
Cao, Z., Simon, T., Wei, S., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR, pp. 1302–1310. IEEE, Honolulu (2017)
Google Scholar
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)
Article Google Scholar
Izadi, S., et al.: KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. In: 24th Annual ACM Symposium on User Interface Software and Technology, pp. 559–568. ACM, New York (2011)
Google Scholar
Shapiro, A., et al.: Rapid avatar capture and simulation using commodity depth sensors. Comput. Animat. Virtual Worlds 25(3–4), 201–211 (2014)
Article Google Scholar
Cui, Y., Chang, W., Nöll, T., Stricker, D.: KinectAvatar: fully automatic body capture using a single kinect. In: Park, J.-I., Kim, J. (eds.) ACCV 2012. LNCS, vol. 7729, pp. 133–147. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37484-5_12
Chapter Google Scholar
Newcombe, R., Fox, D., Seitz, S.: DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: CVPR, pp. 343–352. IEEE, Boston (2015)
Google Scholar
Slavcheva, M., Baust, M., Cremers, D., Ilic, S.: KillingFusion: non-rigid 3D reconstruction without correspondences. In: CVPR, pp. 5474–5483. IEEE, Honolulu (2017)
Google Scholar
Dou, M., et al.: Fusion4D: real-time performance capture of challenging scenes. ACM Trans. Graph. 35(4), 114:1–114:13 (2016)
Article Google Scholar
Leroy, V., Franco, J., Boyer, E.: Multi-view dynamic shape refinement using local temporal integration. In: ICCV, pp. 3113–3122. IEEE, Venice (2017)
Google Scholar
Yu, T., et al.: BodyFusion: real-time capture of human motion and surface geometry using a single depth camera. In: ICCV, pp. 910–919. IEEE, Venice (2017)
Google Scholar
Li, H., Adams, B., Guibas, L.J., Pauly, M.: Robust single-view geometry and motion reconstruction. ACM Trans. Graph. 28(5), 175:1–175:10 (2009)
Article Google Scholar
Zhang, Q., Fu, B., Ye, M., Yang, R.: Quality dynamic human body modeling using a single low-cost depth camera. In: CVPR, pp. 676–683. IEEE, Columbus (2014)
Google Scholar
Guo, K., Xu, F., Wang, Y., Liu, Y., Dai, Q.: Robust non-rigid motion tracking and surface reconstruction using L0 regularization. In: ICCV, pp. 3083–3091. IEEE, Santiago (2015)
Google Scholar
Zollhofer, M., et al.: Real-time non-rigid reconstruction using an RGB-D camera. ACM Trans. Graph. 33(4), 156:1–156:12 (2014)
Article Google Scholar
Xu, W., et al.: MonoPerfCap: human performance capture from monocular video. ACM Trans. Graph. 37(2), 27:1–27:15 (2018)
Article Google Scholar
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. ACM Trans. Graph. 24(3), 408–416 (2005)
Article Google Scholar
Pons-Moll, G., Romero, J., Mahmood, N., Black, M.J.: Dyna: a model of dynamic human shape in motion. ACM Trans. Graph. 34(4), 120:1–120:14 (2015)
Article Google Scholar
Weiss, A., Hirshberg, D., Black, M.J.: Home 3D body scans from noisy image and range data. In: ICCV, pp. 1951–1958. IEEE, Barcelona (2011)
Google Scholar
Bogo, F., Black, M.J., Loper, M., Romero, J.: Detailed full-body reconstructions of moving people from monocular RGB-D sequences. In: ICCV, pp. 2300–2308. IEEE, Santiago (2015)
Google Scholar
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34
Chapter Google Scholar
Huang, Y., et al.: Towards accurate marker-less human shape and pose estimation over time. In: 3DV, pp. 421–430. IEEE, Qingdao (2017)
Google Scholar
Kanazawa, A., Black, M., Jacobs, D., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR. IEEE, Salt Lake City (2018)
Google Scholar
Alldieck, T., Magnor, M., Xu, W., Theobalt, C., Pons-Moll, G.: Video based reconstruction of 3D people models. In: CVPR. IEEE, Salt Lake City (2018)
Google Scholar
Geman, S., McClure, D.: Statistical methods for tomographic image reconstruction. Bull. Int. Stat. Inst. 52, 5–21 (1987)
MathSciNet Google Scholar
Loper, M.M., Black, M.J.: OpenDR: an approximate differentiable renderer. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 154–169. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_11
Chapter Google Scholar
Chumpy. http://chumpy.org
Varol, G., et al.: Learning from synthetic humans. In: CVPR, pp. 4627–4635. IEEE, Honolulu (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Mathematical Sciences, Lund University, Lund, Sweden
Zhongguo Li, Anders Heyden & Magnus Oskarsson

Authors

Zhongguo Li
View author publications
You can also search for this author in PubMed Google Scholar
Anders Heyden
View author publications
You can also search for this author in PubMed Google Scholar
Magnus Oskarsson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongguo Li .

Editor information

Editors and Affiliations

Linköping University, Linköping, Sweden
Michael Felsberg
Linköping University, Linköping, Sweden
Per-Erik Forssén
Uppsala University, Uppsala, Sweden
Ida-Maria Sintorn
Linköping University, Linköping, Sweden
Jonas Unger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Z., Heyden, A., Oskarsson, M. (2019). Parametric Model-Based 3D Human Shape and Pose Estimation from Multiple Views. In: Felsberg, M., Forssén, PE., Sintorn, IM., Unger, J. (eds) Image Analysis. SCIA 2019. Lecture Notes in Computer Science(), vol 11482. Springer, Cham. https://doi.org/10.1007/978-3-030-20205-7_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-20205-7_28
Published: 12 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20204-0
Online ISBN: 978-3-030-20205-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Abstract

1 Introduction

2 Related Work

3 Method

3.1 SMPL Model

3.2 Pose and Shape Fitting

3.3 Optimization

4 Experiment

4.1 Results on Synthetic Data

4.2 Results on Human3.6M

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation