Skip to main content

A Review of Human Mesh Reconstruction: Beyond 2D Video Object Segmentation

  • Conference paper
  • First Online:
Social Robotics (ICSR + InnoBiz 2024)

Abstract

Video object segmentation aims to extract 2D object masks by segmenting video frames into multiple objects, which is crucial in various practical applications such as medical imaging, etc.. However, traditional video object segmentation methods produce 2D masks, which are not suitable for 3D scenarios where depth information is essential, such as in robotic grasping, virtual reality, and autonomous driving, etc.. In this paper, we present a comprehensive review of 3D human mesh reconstruction (HMR) as an extension beyond 2D video object segmentation. We begin by reviewing the mainstream video object segmentation methods, then transition from 2D video object segmentation to 3D HMR. We further categorize recent HMR methods based on key characteristics that define this research field, including the type of model input and the use of statistical models. Finally, we provide detailed information on HMR datasets and evaluation metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Choi, H., Moon, G., Chang, J.Y., Lee, K.M.: Beyond static features for temporally consistent 3d human pose and shape from a video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1964–1973 (2021)

    Google Scholar 

  2. Choi, H., Moon, G., Lee, K.M.: Pose2Mesh: graph convolutional network for 3D human pose and mesh recovery from a 2D human pose. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 769–787. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_45

    Chapter  Google Scholar 

  3. Doersch, C., Zisserman, A.: Sim2real transfer learning for 3d human pose estimation: motion to the rescue. Adv. Neural Inf. Process. Syst. 32 (2019)

    Google Scholar 

  4. Dwivedi, S.K., Athanasiou, N., Kocabas, M., Black, M.J.: Learning to regress bodies from images using differentiable semantic rendering. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11250–11259 (2021)

    Google Scholar 

  5. Guan, S., Xu, J., He, M.Z., Wang, Y., Ni, B., Yang, X.: Out-of-domain human mesh reconstruction via dynamic bilevel online adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 45(4), 5070–5086 (2022)

    Article  MATH  Google Scholar 

  6. Guan, S., Xu, J., Wang, Y., Ni, B., Yang, X.: Bilevel online adaptation for out-of-domain human mesh reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10472–10481 (2021)

    Google Scholar 

  7. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3. 6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)

    Google Scholar 

  8. Joo, H., Neverova, N., Vedaldi, A.: Exemplar fine-tuning for 3d human model fitting towards in-the-wild 3d human pose estimation. In: 2021 International Conference on 3D Vision (3DV), pp. 42–52. IEEE (2021)

    Google Scholar 

  9. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131 (2018)

    Google Scholar 

  10. Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3d human dynamics from video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5614–5623 (2019)

    Google Scholar 

  11. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  12. Kocabas, M., Athanasiou, N., Black, M.J.: Vibe: video inference for human body pose and shape estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5253–5263 (2020)

    Google Scholar 

  13. Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2252–2261 (2019)

    Google Scholar 

  14. Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4501–4510 (2019)

    Google Scholar 

  15. Kolotouros, N., Pavlakos, G., Jayaraman, D., Daniilidis, K.: Probabilistic modeling for human mesh recovery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11605–11614 (2021)

    Google Scholar 

  16. Lee, G.H., Lee, S.W.: Uncertainty-aware human mesh recovery from video by learning part-based 3d dynamics. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12375–12384 (2021)

    Google Scholar 

  17. Lin, K., Wang, L., Liu, Z.: End-to-end human pose and mesh reconstruction with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1954–1963 (2021)

    Google Scholar 

  18. Lin, K., Wang, L., Liu, Z.: Mesh graphormer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12939–12948 (2021)

    Google Scholar 

  19. Mehta, D., et al.: Monocular 3d human pose estimation in the wild using improved cnn supervision. In: 2017 International Conference on 3D Vision (3DV), pp. 506–516. IEEE (2017)

    Google Scholar 

  20. Moon, G., Lee, K.M.: I2L-MeshNet: image-to-lixel prediction network for accurate 3D human pose and mesh estimation from a single RGB image. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 752–768. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_44

    Chapter  Google Scholar 

  21. Nagaraja, N.S., Schmidt, F.R., Brox, T.: Video segmentation with just a few strokes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3235–3243 (2015)

    Google Scholar 

  22. Sun, Y., Ye, Y., Liu, W., Gao, W., Fu, Y., Mei, T.: Human mesh recovery from monocular images via a skeleton-disentangled representation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5349–5358 (2019)

    Google Scholar 

  23. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  24. Von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3d human pose in the wild using imus and a moving camera. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 601–617 (2018)

    Google Scholar 

  25. Wu, P., Lu, X., Shen, J., Yin, Y.: Clip fusion with bi-level optimization for human mesh reconstruction from monocular videos. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 105–115 (2023)

    Google Scholar 

  26. Yu, Z., et al.: Skeleton2mesh: kinematics prior injected unsupervised human mesh recovery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8619–8629 (2021)

    Google Scholar 

  27. Zanfir, A., Bazavan, E.G., Zanfir, M., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: Neural descent for visual 3d human pose and shape. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14484–14493 (2021)

    Google Scholar 

  28. Zanfir, M., Zanfir, A., Bazavan, E.G., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: Thundr: transformer-based 3d human reconstruction with markers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12971–12980 (2021)

    Google Scholar 

  29. Zhang, F., Zhu, X., Dai, H., Ye, M., Zhu, C.: Distribution-aware coordinate representation for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7093–7102 (2020)

    Google Scholar 

  30. Zhang, H., et al.: Pymaf: 3d human pose and shape regression with pyramidal mesh alignment feedback loop. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11446–11456 (2021)

    Google Scholar 

  31. Zheng, C., Mendieta, M., Wang, P., Lu, A., Chen, C.: A lightweight graph transformer network for human mesh reconstruction from 2d human pose. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 5496–5507 (2022)

    Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (No. 62106128, 62101309), the Natural Science Foundation of Shandong Province (No. ZR2021QF001, ZR2021QF109), Shandong Province Science and Technology Small and Medium-sized Enterprise Innovation Capacity Enhancement Project (2023TSGC0115), Shandong Province Higher Education Institutions Youth Entrepreneurship and Technology Support Program (2023KJ027).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiankai Lu .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors of this paper have no competing interests.

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, P. et al. (2025). A Review of Human Mesh Reconstruction: Beyond 2D Video Object Segmentation. In: Li, H., et al. Social Robotics. ICSR + InnoBiz 2024. Lecture Notes in Computer Science(), vol 15170. Springer, Singapore. https://doi.org/10.1007/978-981-96-1151-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-96-1151-5_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-96-1150-8

  • Online ISBN: 978-981-96-1151-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics