METRO-X: Combining Vertex and Parameter Regressions for Recovering 3D Human Meshes with Full Motions

Li, Guiqing; Yao, Chenhao; Zhang, Huiqian; Zeng, Juncheng; Nie, Yongwei; Xian, Chuhua

doi:10.1007/978-3-031-50072-5_4

Guiqing Li¹²,
Chenhao Yao¹²,
Huiqian Zhang¹²,
Juncheng Zeng¹²,
Yongwei Nie¹² &
…
Chuhua Xian¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14496))

Included in the following conference series:

Computer Graphics International Conference

Abstract

It is well known that regressing the parametric representation of a human body from a single image suffers low accuracy due to sparse information use and error accumulation. Although being able to achieve higher accuracy by avoiding these issues, directly regressing vertices may result in vertex outliers. We present METRO-X, a novel method for reconstructing full-body human meshes with body pose, facial expression and hand gesture from a single image, which combines the advantages from the two disciplines so as to achieve higher accuracy than parameter regression while bear denser vertices and generate smoother shape than vertices regression. It first detects and extracts hands, head and the whole body parts from a given image, then regresses the vertices of three parts separately using METRO, and finally fits SMPL-X to the reconstructed meshes to obtain the complete parametric representation. Experimental results show that METRO-X outperforms the ExPose method, with a significant 23% improvement in body accuracy and a 35% improvement in gesture accuracy. These results demonstrate the potential of our approach in enabling various applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Hierarchical Kinematic Human Mesh Recovery

STAR: Sparse Trained Articulated Human Body Regressor

Multi-Person 3D Pose and Shape Estimation via Inverse Kinematics and Refinement

References

Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686–3693 (2014)
Google Scholar
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep It SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34
Chapter Google Scholar
Chen, D., Song, Y., Liang, F., Ma, T., Zhu, X., Jia, T.: 3D human body reconstruction based on SMPL model. Vis. Comput. 39(5), 1893–1906 (2022). https://doi.org/10.1007/s00371-022-02453-x
Cho, J., Youwang, K., Oh, T.H.: Cross-attention of disentangled modalities for 3D human mesh recovery with transformers. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. LNCS, vol. 13661, pp. 342–359. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_20
Choutas, V., Pavlakos, G., Bolkart, T., Tzionas, D., Black, M.J.: Monocular expressive body regression through body-driven attention. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 20–40. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_2
Chapter Google Scholar
Feng, Y., Choutas, V., Bolkart, T., Tzionas, D., Black, M.J.: Collaborative regression of expressive bodies using moderation. In: 2021 International Conference on 3D Vision (3DV), pp. 792–804 (2021)
Google Scholar
Feng, Y., Feng, H., Black, M.J., Bolkart, T.: Learning an animatable detailed 3D face model from in-the-wild images. ACM Trans. Graph. 40(4), 1–13 (2021)
Article Google Scholar
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3. 6 m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)
Google Scholar
Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British Machine Vision Conference, pp. 12.1–12.11 (2010)
Google Scholar
Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1465–1472 (2011)
Google Scholar
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131 (2018)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4396–4405 (2019). https://doi.org/10.1109/CVPR.2019.00453
Kocabas, M., Athanasiou, N., Black, M.J.: VIBE: video inference for human body pose and shape estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5253–5263 (2020)
Google Scholar
Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2252–2261 (2019)
Google Scholar
Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4501–4510 (2019)
Google Scholar
Li, M., et al.: Interacting attention graph for single image two-hand reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2761–2770 (2022)
Google Scholar
Li, X., Li, G., Li, T., Lv, J., Mitrouchev, P.: Remodeling of mannequins based on automatic binding of mesh to anthropometric parameters. Vis. Comput. 1–24 (2022)
Google Scholar
Li, Z., Liu, J., Zhang, Z., Xu, S., Yan, Y.: CLIFF: carrying location information in full frames into human pose and shape estimation. arXiv preprint arXiv:2208.00571 (2022)
Lin, K., Wang, L., Liu, Z.: End-to-end human pose and mesh reconstruction with transformers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1954–1963 (2021)
Google Scholar
Lin, K., Wang, L., Liu, Z.: Mesh graphormer. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 12939–12948 (2021)
Google Scholar
Lin, T.Y., et al.: Microsoft coco: Common objects in context. In: Proceedings of the European Conference on Computer Vision, pp. 740–755 (2014)
Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 1–16 (2015)
Article Google Scholar
von Marcard, T., Henschel, R., Black, M., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3d human pose in the wild using IMUs and a moving camera. In: European Conference on Computer Vision (ECCV), September 2018
Google Scholar
Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: 2017 International Conference on 3D Vision (3DV), pp. 506–516 (2017). https://doi.org/10.1109/3DV.2017.00064
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 10975–10985 (2019)
Google Scholar
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10975–10985 (2019)
Google Scholar
Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 459–468 (2018)
Google Scholar
Ranjan, A., Bolkart, T., Sanyal, S., Black, M.J.: Generating 3D faces using convolutional mesh autoencoders. In: Proceedings of the European Conference on Computer Vision, pp. 704–720 (2018)
Google Scholar
Varol, G., et al.: Bodynet: volumetric inference of 3D human body shapes. In: Proceedings of the European Conference on Computer Vision, pp. 20–36 (2018)
Google Scholar
Wan, Z., Li, Z., Tian, M., Liu, J., Yi, S., Li, H.: Encoder-decoder with multi-level attention for 3D human shape and pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 13033–13042 (2021)
Google Scholar
Wang, K., Zhang, G., Yang, J., Bao, H.: Dynamic human body reconstruction and motion tracking with low-cost depth cameras. Vis. Comput. 37, 603–618 (2021)
Article Google Scholar
Wei, W.L., Lin, J.C., Liu, T.L., Liao, H.Y.M.: Capturing humans in motion: temporal-attentive 3D human pose and shape estimation from monocular video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 13211–13220 (2022)
Google Scholar
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Zhou, X., Zhu, M., Pavlakos, G., Leonardos, S., Derpanis, K.G., Daniilidis, K.: MonoCap: monocular human motion capture using a CNN coupled with a geometric prior. IEEE Trans. Pattern Anal. Mach. Intell. 41(4), 901–914 (2019). https://doi.org/10.1109/TPAMI.2018.2816031
Article Google Scholar
Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M., Brox, T.: FreiHAND: a dataset for markerless capture of hand pose and shape from single RGB images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 813–822 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

South China University of Technology, Guangzhou, 510006, China
Guiqing Li, Chenhao Yao, Huiqian Zhang, Juncheng Zeng, Yongwei Nie & Chuhua Xian

Authors

Guiqing Li
View author publications
You can also search for this author in PubMed Google Scholar
Chenhao Yao
View author publications
You can also search for this author in PubMed Google Scholar
Huiqian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Juncheng Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Yongwei Nie
View author publications
You can also search for this author in PubMed Google Scholar
Chuhua Xian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guiqing Li .

Editor information

Editors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Bin Sheng
Shanghai Jiao Tong University, Shanghai, China
Lei Bi
University of Sydney, Sydney, NSW, Australia
Jinman Kim
MIRALab-CUI, University of Geneve, Carouge, Geneve, Switzerland
Nadia Magnenat-Thalmann
Swiss Federal Institute of Technology, Lausanne, Switzerland
Daniel Thalmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, G., Yao, C., Zhang, H., Zeng, J., Nie, Y., Xian, C. (2024). METRO-X: Combining Vertex and Parameter Regressions for Recovering 3D Human Meshes with Full Motions. In: Sheng, B., Bi, L., Kim, J., Magnenat-Thalmann, N., Thalmann, D. (eds) Advances in Computer Graphics. CGI 2023. Lecture Notes in Computer Science, vol 14496. Springer, Cham. https://doi.org/10.1007/978-3-031-50072-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-50072-5_4
Published: 29 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50071-8
Online ISBN: 978-3-031-50072-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics