Coarse-to-Fine 3D Human Pose Estimation

Guo, Yu; Zhao, Lin; Zhang, Shanshan; Yang, Jian

doi:10.1007/978-3-030-34113-8_48

Yu Guo¹⁴,
Lin Zhao¹⁴,
Shanshan Zhang¹⁴ &
…
Jian Yang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11903))

Included in the following conference series:

International Conference on Image and Graphics

1710 Accesses
2 Citations

Abstract

Leveraging powerful deep convolutional networks, 2d human pose estimation has achieved great success. On the other hand, 3d human pose estimation is still a challenging task that attracts great attention. Due to the inherent depth ambiguity in 2d to 3d mapping, conventional methods are typically not able to predict 3d locations precisely, especially for the joints far from the torso. In this paper, we propose a coarse-to-fine model to predict 3d joint locations progressively. We observe that some joints like shoulders and hips are relatively easy to get precise 3d locations, which can be utilized to facilitate the prediction of hard joints that are far from the torso. To make this happen, a set of constraints based on human limb length ratio prior is proposed to guide the model to generate reasonable predictions. We conduct experiments on the Human3.6M dataset. Comparison of experimental results on the benchmark dataset turns out that our approach outperforms the baseline method.

You have full access to this open access chapter, Download conference paper PDF

3D Human Pose Estimation Using Convolutional Neural Networks with 2D Pose Information

U-shaped spatial–temporal transformer network for 3D human pose estimation

Article 04 September 2022

A Multi-scale Recalibrated Approach for 3D Human Pose Estimation

Keywords

1 Introduction

Human pose estimation, also called as human keypoints detection, has received extensive attention in recent years. The primary purpose of human pose estimation is to predict human joint locations from monocular RGB information. Human pose estimation is a classical middle-level computer vision task and can greatly facilitate other related high-level tasks such as pedestrian detection [28] and action recognition [7].

Following the success of deep convolutional networks, current 2d human pose estimation methods perform well even in complex outdoor environments. Figure 1 shows typical 2d human pose estimation results predicted by stacked hourglass [18] on Human3.6M dataset [11]. However, unlike on Human3.6M dataset [11]. However, unlike 2d human pose estimation, it is challenging to obtain annotated data for 3d human pose estimation tasks. Most 3d human pose datasets only contain indoor data collected in a laboratory environment, which leads to lack of diversity. Thus, models tend to overfit when training on such datasets. Besides, ambiguity is a widespread problem when mapping 2d to 3d, which also results in unreasonable predictions.

In this paper, we propose a novel coarse-to-fine method for 3d human pose estimation. From our analysis, we find that current models usually produce large errors when predicting keypoints located at the end of limbs, such as wrists and ankles. In contrast, joints like shoulders and hips are relatively easy to predict. Table 1 shows detailed statistics about errors of each joint by [14]. We assume that easy joints can be helpful to guide the prediction of hard joints. Therefore we propose a coarse-to-fine method to predict different joints in a progressive way. An intuitive way to deal with ambiguity in 3d human pose estimation is to leverage the prior of human structure. For instance, Dabral et al. [5] use legal angular constraints in their model. Here, we propose a set of limb length ratio (LLR) constraints to reduce the shifts of joints from the true locations.

Our contributions can be summarized as follows:

We propose a specific coarse-to-fine method for 3d human pose estimation task to enhance precision of the joints far from the torso. Based on the statistical analysis of predictions produced by the previous state-of-the-art method, we divide joints into three groups according to different difficulty levels. Easy joints are predicted first, and then they are used to facilitate the prediction of harder joints.
A set of human limb length ratio (LLR) constraints based on the statistics of physical human body structure are used to avoid unreasonable predictions, allowing the model to perform more robust on hard joints.
By combining the coarse-to-fine model and LLR constraints, our method outperforms the baseline on the Human 3.6M dataset. Especially the improvement is more significant for those joints far from the torso.

Table 1. Detailed statistics on the error of each joint produced by [14]. Numbers denote the error of each joint in millimeters. Under protocol 2, the model predictions are post-processed with rigid alignments.

Full size table

2 Related Work

Since our method is specifically designed for the 3d human pose estimation task, we will first review recent works on it. Moreover, we will review recent works on the usage of human structure prior to the task for human pose estimation.

2.1 3D Human Pose Estimation

The topic of 3d human pose estimation attracts increasing attention in recent years due to its potentially broad application prospects. The purpose of 3d human pose estimation task is to estimate accurate spatial position coordinates of human keypoints from RGB images. It is proven that positions of human keypoints are beneficial for generic action recognition tasks in previous works [13, 22]. In the current stage, it is almost impossible to predict 3d coordinates in the world coordinate system, as is declared in [14]. Thus most of the current methods predict coordinates in the camera coordinate system [5, 9, 25]. In this paper, our model predicts 3d human keypoint locations in the camera coordinate system as well.

Various types of methods, as well as diverse representations are proposed for 3d human pose estimation. A typical way of 3d human pose estimation is to use 3d coordinates to represent human keypoint locations and to regress coordinates from a single RGB image directly, as is proposed in [21]. However, the mapping from RGB images to 3d coordinates is so complex that it is challenging to learn the potential knowledge between images and coordinates. In order to overcome this problem, volumetric representation is used as supervision [21, 27], which contains richer information than coordinates. Volumetric representation, however, leads to a huge number of model parameters and increasing computational complexity. A compromise solution is to use 3d coordinates as supervision, leveraging 2d human pose predictions at the same time. With the help of powerful convolutional neural networks (CNN), the performance of 2d human pose estimation has great improvements in recent years. A simple yet effective method is to use 2d human pose predictions as input to regress 3d coordinates of human keypoints [14]. Based on this work, [9] combines temporal information with 2d to 3d pose regression, which allows the model to perform well. However, temporal information puts high demands on the data, and also such a model costs too much computation, making it hard to be used in practical applications.

These works make good progress, but it is worth mentioning that the points far from torso flutter heavily in their predictions. This phenomenon is consistent with the problem in 2d pose estimation, as proposed in [24]. In this paper, we propose a coarse-to-fine method, which takes 2d human pose prediction from a single image as input and predicts the 3d coordinates of human keypoints. We divide human keypoints to three groups according to different difficulty levels. The further the keypoints are from the human torso, the harder they are for a model to predict. Our model predicts easy keypoints first and then predicts medium and hard keypoints in turn, leveraging former prediction results.

2.2 Human Structure Prior in Pose Estimation

In previous works, models often generate unreasonable predictions, which makes human structure prior indispensable in human pose estimation tasks. In 2d human pose estimation, [4] leverages generative adversarial networks to guide a model to learn human structure prior implicitly. [5] proposes angular constraints based on the human prior that the range of motions of human joints is limited and symmetry. These constraints are reasonable while the limb length ratio can be another useful constraint, whose distribution is proven to obey specific rules [6]. In this paper, we propose a set of constraints based on the human limb length ratio, and experiments demonstrate it is helpful for a model to get better performance in the task of 3d human pose estimation.

3 Method

In this section, we will discuss the method proposed for 3d human pose estimation. We start with the coarse-to-fine method and introduce the limb length ratio (LLR) constraint to solve the problem better.

3.1 Coarse-to-Fine Model

In previous works, models usually perform worse when predicting keypoints far from the torso such as wrists and ankles. In order to overcome this problem, we propose a coarse-to-fine method. In our method, we first divide keypoints into three groups according to the prediction difficulty. From Table 1, we can observe that the closer the keypoints are to the body torso, the more accurate the model prediction is. For instance, the model performs better when predicting the location of the head than elbows; and performs worse when predicting ankles than knees. Thus we can divide keypoints, according to their distance to the torso, into three groups: easy, medium, and hard. A detailed demonstration is shown in Fig. 2. According to Table 1, we classify head, spine, thorax, hip and shoulder as easy joints, elbow and knee as medium joints, wrists and ankles as hard joints, as shown in Fig. 2.

Based on the characteristic of different difficulty levels of joints, we design a specific coarse-to-fine model. The network structure of our model is shown in Fig. 3. The input of our model is 2d keypoints predictions produced by a 2d human pose estimator, and the output is predictions of 3d human keypoints coordinates. As we can see in Fig. 3, our model contains three stages. In the first stage, we predict easy joints by using a simple fully-connected network, which is effective in a regression task mapping 2d coordinates to 3d coordinates [14]. In the second and third stage, we predict medium and hard keypoints, taking both 2d keypoints and 3d coordinates predictions produced in the previous stage(s) as input. Therefore we can leverage predicted 3d joint coordinates as auxiliary information to guide the model to produce more accurate predictions for challenging keypoints. In order to merge 2d keypoints and 3d keypoint predictions produced in previous stages, we adopt channel wise self-attention blocks, as is proposed in [10], to guide the model to assign appropriate weights for predicted 3d keypoint coordinates in the second and third stages. We compute Euclidean distance between 3d keypoints prediction and groundtruth as the keypoints loss $L_K$,

$$\begin{aligned} L_K(x,y) = \frac{1}{m}\sum _{i=1}^{m}\Vert x_i - y_i \Vert , \end{aligned}$$

(1)

where x, y stands for the model prediction and groundtruth respectively, m stands for the number of keypoints. Considering that our model produces predictions in three stages, the loss function is written as

$$\begin{aligned} L_{CTF}(x,y) = \theta _1 L_K(x_e,y_e) + \theta _2 L_K(x_m,y_m) + \theta _3 L_K(x_h,y_h) , \end{aligned}$$

(2)

where subscript e, m, h denotes easy, medium, and hard keypoints respectively.

3.2 LLR Constraint

Human pose prior knowledge is helpful in the 3d human pose estimation task; and human limb length ratio (LLR) is an important prior, which is studied in [6]. Within the best of our knowledge, few researches focus on LLR prior, which helps predict accurate 3d coordinates. In this paper, we propose a set of LLR constraints based on the LLR prior. According to the research of De Leva [6], we can assume that the distribution of adult limb length ratio obeys normalization distribution. Therefore we can census the dataset to get the mean value and stand deviation of the limb length ratio of the dataset.

Table 2. Comparison to current state-of-the-art methods on the Human3.6M validation set under protocol 1. Bold indicates the best results.

Full size table

Table 3. Comparison to current state-of-the-art methods on the Human3.6M validation set under protocol 2. Bold indicates the best results.

Full size table

The length of a limb can be computed as follows,

$$\begin{aligned} l(x_1,x_2) = \Vert x_1 - x_2 \Vert , \end{aligned}$$

(3)

where $x_1$, $x_2$ stands for 3d coordinates of corresponding keypoints lying at the ends of limbs. The limb length ratio between limb p and limb q can be computed as follows,

$$\begin{aligned} r(p,q) = \frac{l(p_{x_1},p_{x_2})}{l(q_{x_1},q_{x_2})}, \end{aligned}$$

(4)

where $p_{x_1}$, $p_{x_2}$, $q_{x_1}$ and $q_{x_2}$ stand for 3d keypoint coordinates lying at the ends of limb p and limb q respectively. Then the LLR loss can be written as

$$\begin{aligned} L_{LLR}(X) = \frac{1}{m} \sum _{i=1}^{m} \left( {1- \frac{1}{\sqrt{2\pi }s}exp(-\frac{1}{2} \frac{\left( r(X_{i_p},X_{i_q})-\overline{R}\right) ^2 }{s}}\right) , \end{aligned}$$

(5)

where $X_{i_p}$ and $X_{i_q}$ denote the limb in the ratio pair respectively, $\overline{R}$ and s denote the mean value and standard deviation of the limb length ratio of a chosen pair $r(X_{i_p},X_{i_q})$ that computed on the training set respectively. In addition, we use the Gaussian function to punish the ratio offset. Then the final loss function is

$$\begin{aligned} Loss = \alpha L_{CTF} + \beta L_{LLR}, \end{aligned}$$

(6)

where $\alpha $ and $\beta $ are hyper-parameters and denote scale coefficients of the corresponding loss items.

4 Experiments

In this section, we will first describe the implementation details, followed by experimental results on the Human3.6M dataset [11]. In addition, intuitive comparisons between our model and benchmark methods are present.

Table 4. Comparison of the baseline and our method w.r.t the prediction errors of medium and hard keypoints.

Full size table

4.1 Dataset

We conduct experiments on the Human3.6M dataset to demonstrate the performance of our method. Human3.6M is a widely used dataset in the field of 3d human pose estimation, which contains comprehensive annotations. The data of Human3.6M dataset are collected in a laboratory environment, including 11 professional actors and 17 scenarios. 3d human keypoint position annotations are obtained from a high-speed motion capture system with 4 calibrated cameras. In this paper, we choose 5 actors as the training set and 2 actors as the validation set, which is consistent with widely used protocols [12, 14, 27]. It is worth mentioning that we do not leverage the temporal information considering real-time performance.

4.2 Implementation Details

In our coarse-to-fine method, we use the predictions of stacked hourglass [18], a state-of-the-art 2d human pose estimation method, as the input of our coarse-to-fine method. A prediction of stacked hourglass includes 16 keypoints. We reshape each 2d human pose prediction to a vector with shape $1\times 32$ and reshape corresponding 3d human pose ground truth to a vector with shape $1\times 48$ during data preprocessing. The 3d human pose ground truth coordinate is transformed to the camera coordinate system. In order to facilitate comparisons with other methods, we set the keypoint Hip as the coordinate system origin, which is the midpoint of the left hip and right hip, following [9, 14]. In order to make the model easier to convergence, we normalize 2d pose predictions and 3d pose ground truth with mean and variance calculated in the training set. In order to avoid the gradient explosion problem, we clip the maximum L2 norm of gradient every time backpropagation is operated. The model is trained with 128 batch size and 1.22 million iterations in total; the initial learning rate is set to $1\times 10^{-3}$, which is decreased by 0.96 every 10k iterations.

All experiments are conducted on one Nvidia Tesla K80 GPU with 12 Gigabyte memory.

4.3 Comparison with State-of-the-Art Methods

In Table 2, we present the results of our methods and make a comparison with the state-of-the-art methods under protocol 1. We can see clearly that our coarse-to-fine method performs well on Human3.6M dataset. When combined with LLR loss, the performance of our method is further improved and decreases the average error to 60.6 mm. Under protocol 2, rigid alignment is applied to the predictions and our method outperforms comparison methods on every action, as shown in Table 3. In Table 4, we can clearly see that our method, which combines LLR loss and coarse-to-fine method, outperforms the baseline method when predicting medium and hard keypoints. Figure 4 presents some examples of our predicted 3d human poses on the Human3.6M dataset.

In order to explore the generalization performance of our method, we conduct qualitative experiments on MPII dataset [2] and make a comparison between our method and [14], as is shown in Fig. 5. We can see that in most situations, our method produces more reasonable predictions compared with [14] even in wild scenes. While it is worth mentioning that the occlusion of 2d joints has a huge negative impact on 3d prediction.

5 Conclusion

In this paper, we propose a coarse-to-fine method for 3d human pose estimation and a set of human structure based limb length ratio constraints. Experimental results indicate that our method is useful, mainly when predicting challenging keypoints that are far from the torso. Encouraged by the current results, we will investigate how to explore context information to improve the performance further.

References

Akhter, I., Black, M.J.: Pose-conditioned joint angle limits for 3D human pose reconstruction. In: CVPR, pp. 1446–1455 (2015)
Google Scholar
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR (2014)
Google Scholar
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34
Chapter Google Scholar
Chen, Y., Shen, C., Wei, X.S., Liu, L., Yang, J.: Adversarial PoseNet: a structure-aware convolutional network for human pose estimation. In: ICCV, pp. 1212–1221 (2017)
Google Scholar
Dabral, R., Mundhada, A., Kusupati, U., Afaque, S., Sharma, A., Jain, A.: Learning 3D human pose from structure and motion. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 679–696. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_41
Chapter Google Scholar
De Leva, P.: Adjustments to Zatsiorsky-Seluyanov’s segment inertia parameters. J. Biomech. 29(9), 1223–1230 (1996)
Article Google Scholar
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: CVPR, pp. 1110–1118 (2015)
Google Scholar
Du, Y., et al.: Marker-less 3D human motion capture with monocular image sequence and height-maps. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_2
Chapter Google Scholar
Hossain, M.R.I., Little, J.J.: Exploiting temporal information for 3D human pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 69–86. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_5
Chapter Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018)
Google Scholar
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)
Article Google Scholar
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR, pp. 7122–7131 (2018)
Google Scholar
Luvizon, D.C., Picard, D., Tabia, H.: 2D/3D pose estimation and action recognition using multitask deep learning. In: CVPR, pp. 5137–5146 (2018)
Google Scholar
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: ICCV, pp. 2640–2649 (2017)
Google Scholar
Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: 3DV, pp. 506–516 (2017)
Google Scholar
Mehta, D., et al.: VNect: Real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. 36(4), 44 (2017)
Article Google Scholar
Moreno-Noguer, F.: 3D human pose estimation from a single image via distance matrix regression. In: CVPR, pp. 2823–2832 (2017)
Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Nie, B.X., Wei, P., Zhu, S.C.: Monocular 3D human pose estimation by predicting depth on joints. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3467–3475. IEEE (2017)
Google Scholar
Park, S., Hwang, J., Kwak, N.: 3D human pose estimation using convolutional neural networks with 2D pose information. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 156–169. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_15
Chapter Google Scholar
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: CVPR, pp. 7025–7034 (2017)
Google Scholar
Popa, A.I., Zanfir, M., Sminchisescu, C.: Deep multitask architecture for integrated 2D and 3D human sensing. In: CVPR, pp. 6289–6298 (2017)
Google Scholar
Ramakrishna, V., Kanade, T., Sheikh, Y.: Reconstructing 3D human pose from 2D image landmarks. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 573–586. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_41
Chapter Google Scholar
Ronchi, M.R., Perona, P.: Benchmarking and error diagnosis in multi-instance pose estimation. In: ICCV, pp. 369–378 (2017)
Google Scholar
Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 536–553. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_33
Chapter Google Scholar
Tekin, B., Rozantsev, A., Lepetit, V., Fua, P.: Direct prediction of 3D body poses from motion compensated sequences. In: CVPR, pp. 991–1000 (2016)
Google Scholar
Trumble, M., Gilbert, A., Hilton, A., Collomosse, J.: Deep autoencoder for combined human pose estimation and body model upscaling. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 800–816. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_48
Chapter Google Scholar
Zhang, S., Yang, J., Schiele, B.: Occluded pedestrian detection through guided attention in CNNS. In: CVPR, pp. 6995–7003 (2018)
Google Scholar
Zhou, X., Zhu, M., Leonardos, S., Daniilidis, K.: Sparse representation for 3D shape estimation: a convex relaxation approach. IEEE Trans. Pattern Anal. Mach. Intell. 39(8), 1648–1661 (2017)
Article Google Scholar
Zhou, X., Zhu, M., Leonardos, S., Derpanis, K.G., Daniilidis, K.: Sparseness meets deepness: 3D human pose estimation from monocular video. In: CVPR, pp. 4966–4975 (2016)
Google Scholar
Zhou, X., Sun, X., Zhang, W., Liang, S., Wei, Y.: Deep kinematic pose regression. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 186–201. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_17
Chapter Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their critical and constructive comments and suggestions. This work was supported by the National Natural Science Foundation of China under Grant No. U1713208, 61702262 and 61802189, Funds for International Cooperation and Exchange of the National Natural Science Foundation of China under Grant No. 61861136011, Natural Science Foundation of Jiangsu Province, China under Grant No. BK20181299 and BK20180464, the Fundamental Research Funds for the Central Universities under Grant No. 30918011322 and 30918014107, Program for Changjiang Scholars, CCF-Tencent Open Fund No. RAGR20180113, and Young Elite Scientists Sponsorship Program by CAST No. 2018QNRC001.

Author information

Authors and Affiliations

PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, and Jiangsu Key Lab of Image and Video Understanding for Social Security, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Yu Guo, Lin Zhao, Shanshan Zhang & Jian Yang

Authors

Yu Guo
View author publications
You can also search for this author in PubMed Google Scholar
Lin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shanshan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shanshan Zhang or Jian Yang .

Editor information

Editors and Affiliations

Beijing Jiaotong University, Beijing, China
Yao Zhao
The Australian National University, Canberra, Australia
Nick Barnes
Peking University, Peking, China
Baoquan Chen
The Technical University of Munich, München, Bayern, Germany
Rüdiger Westermann
Zhejiang University, Hangzhou, China
Xiangwei Kong
Beijing Jiaotong University, Beijing, China
Chunyu Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, Y., Zhao, L., Zhang, S., Yang, J. (2019). Coarse-to-Fine 3D Human Pose Estimation. In: Zhao, Y., Barnes, N., Chen, B., Westermann, R., Kong, X., Lin, C. (eds) Image and Graphics. ICIG 2019. Lecture Notes in Computer Science(), vol 11903. Springer, Cham. https://doi.org/10.1007/978-3-030-34113-8_48

Download citation

DOI: https://doi.org/10.1007/978-3-030-34113-8_48
Published: 28 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34112-1
Online ISBN: 978-3-030-34113-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Coarse-to-Fine 3D Human Pose Estimation

Abstract

Similar content being viewed by others

3D Human Pose Estimation Using Convolutional Neural Networks with 2D Pose Information

U-shaped spatial–temporal transformer network for 3D human pose estimation

A Multi-scale Recalibrated Approach for 3D Human Pose Estimation

Keywords

1 Introduction