Skip to main content

METRO-X: Combining Vertex and Parameter Regressions for Recovering 3D Human Meshes with Full Motions

  • Conference paper
  • First Online:
Advances in Computer Graphics (CGI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14496))

Included in the following conference series:

Abstract

It is well known that regressing the parametric representation of a human body from a single image suffers low accuracy due to sparse information use and error accumulation. Although being able to achieve higher accuracy by avoiding these issues, directly regressing vertices may result in vertex outliers. We present METRO-X, a novel method for reconstructing full-body human meshes with body pose, facial expression and hand gesture from a single image, which combines the advantages from the two disciplines so as to achieve higher accuracy than parameter regression while bear denser vertices and generate smoother shape than vertices regression. It first detects and extracts hands, head and the whole body parts from a given image, then regresses the vertices of three parts separately using METRO, and finally fits SMPL-X to the reconstructed meshes to obtain the complete parametric representation. Experimental results show that METRO-X outperforms the ExPose method, with a significant 23% improvement in body accuracy and a 35% improvement in gesture accuracy. These results demonstrate the potential of our approach in enabling various applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686–3693 (2014)

    Google Scholar 

  2. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep It SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34

    Chapter  Google Scholar 

  3. Chen, D., Song, Y., Liang, F., Ma, T., Zhu, X., Jia, T.: 3D human body reconstruction based on SMPL model. Vis. Comput. 39(5), 1893–1906 (2022). https://doi.org/10.1007/s00371-022-02453-x

  4. Cho, J., Youwang, K., Oh, T.H.: Cross-attention of disentangled modalities for 3D human mesh recovery with transformers. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. LNCS, vol. 13661, pp. 342–359. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_20

  5. Choutas, V., Pavlakos, G., Bolkart, T., Tzionas, D., Black, M.J.: Monocular expressive body regression through body-driven attention. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 20–40. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_2

    Chapter  Google Scholar 

  6. Feng, Y., Choutas, V., Bolkart, T., Tzionas, D., Black, M.J.: Collaborative regression of expressive bodies using moderation. In: 2021 International Conference on 3D Vision (3DV), pp. 792–804 (2021)

    Google Scholar 

  7. Feng, Y., Feng, H., Black, M.J., Bolkart, T.: Learning an animatable detailed 3D face model from in-the-wild images. ACM Trans. Graph. 40(4), 1–13 (2021)

    Article  Google Scholar 

  8. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3. 6 m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)

    Google Scholar 

  9. Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British Machine Vision Conference, pp. 12.1–12.11 (2010)

    Google Scholar 

  10. Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1465–1472 (2011)

    Google Scholar 

  11. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131 (2018)

    Google Scholar 

  12. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4396–4405 (2019). https://doi.org/10.1109/CVPR.2019.00453

  13. Kocabas, M., Athanasiou, N., Black, M.J.: VIBE: video inference for human body pose and shape estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5253–5263 (2020)

    Google Scholar 

  14. Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2252–2261 (2019)

    Google Scholar 

  15. Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4501–4510 (2019)

    Google Scholar 

  16. Li, M., et al.: Interacting attention graph for single image two-hand reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2761–2770 (2022)

    Google Scholar 

  17. Li, X., Li, G., Li, T., Lv, J., Mitrouchev, P.: Remodeling of mannequins based on automatic binding of mesh to anthropometric parameters. Vis. Comput. 1–24 (2022)

    Google Scholar 

  18. Li, Z., Liu, J., Zhang, Z., Xu, S., Yan, Y.: CLIFF: carrying location information in full frames into human pose and shape estimation. arXiv preprint arXiv:2208.00571 (2022)

  19. Lin, K., Wang, L., Liu, Z.: End-to-end human pose and mesh reconstruction with transformers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1954–1963 (2021)

    Google Scholar 

  20. Lin, K., Wang, L., Liu, Z.: Mesh graphormer. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 12939–12948 (2021)

    Google Scholar 

  21. Lin, T.Y., et al.: Microsoft coco: Common objects in context. In: Proceedings of the European Conference on Computer Vision, pp. 740–755 (2014)

    Google Scholar 

  22. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 1–16 (2015)

    Article  Google Scholar 

  23. von Marcard, T., Henschel, R., Black, M., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3d human pose in the wild using IMUs and a moving camera. In: European Conference on Computer Vision (ECCV), September 2018

    Google Scholar 

  24. Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: 2017 International Conference on 3D Vision (3DV), pp. 506–516 (2017). https://doi.org/10.1109/3DV.2017.00064

  25. Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 10975–10985 (2019)

    Google Scholar 

  26. Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10975–10985 (2019)

    Google Scholar 

  27. Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 459–468 (2018)

    Google Scholar 

  28. Ranjan, A., Bolkart, T., Sanyal, S., Black, M.J.: Generating 3D faces using convolutional mesh autoencoders. In: Proceedings of the European Conference on Computer Vision, pp. 704–720 (2018)

    Google Scholar 

  29. Varol, G., et al.: Bodynet: volumetric inference of 3D human body shapes. In: Proceedings of the European Conference on Computer Vision, pp. 20–36 (2018)

    Google Scholar 

  30. Wan, Z., Li, Z., Tian, M., Liu, J., Yi, S., Li, H.: Encoder-decoder with multi-level attention for 3D human shape and pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 13033–13042 (2021)

    Google Scholar 

  31. Wang, K., Zhang, G., Yang, J., Bao, H.: Dynamic human body reconstruction and motion tracking with low-cost depth cameras. Vis. Comput. 37, 603–618 (2021)

    Article  Google Scholar 

  32. Wei, W.L., Lin, J.C., Liu, T.L., Liao, H.Y.M.: Capturing humans in motion: temporal-attentive 3D human pose and shape estimation from monocular video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 13211–13220 (2022)

    Google Scholar 

  33. Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2

  34. Zhou, X., Zhu, M., Pavlakos, G., Leonardos, S., Derpanis, K.G., Daniilidis, K.: MonoCap: monocular human motion capture using a CNN coupled with a geometric prior. IEEE Trans. Pattern Anal. Mach. Intell. 41(4), 901–914 (2019). https://doi.org/10.1109/TPAMI.2018.2816031

    Article  Google Scholar 

  35. Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M., Brox, T.: FreiHAND: a dataset for markerless capture of hand pose and shape from single RGB images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 813–822 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guiqing Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, G., Yao, C., Zhang, H., Zeng, J., Nie, Y., Xian, C. (2024). METRO-X: Combining Vertex and Parameter Regressions for Recovering 3D Human Meshes with Full Motions. In: Sheng, B., Bi, L., Kim, J., Magnenat-Thalmann, N., Thalmann, D. (eds) Advances in Computer Graphics. CGI 2023. Lecture Notes in Computer Science, vol 14496. Springer, Cham. https://doi.org/10.1007/978-3-031-50072-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-50072-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-50071-8

  • Online ISBN: 978-3-031-50072-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics