Abstract
Video object segmentation aims to extract 2D object masks by segmenting video frames into multiple objects, which is crucial in various practical applications such as medical imaging, etc.. However, traditional video object segmentation methods produce 2D masks, which are not suitable for 3D scenarios where depth information is essential, such as in robotic grasping, virtual reality, and autonomous driving, etc.. In this paper, we present a comprehensive review of 3D human mesh reconstruction (HMR) as an extension beyond 2D video object segmentation. We begin by reviewing the mainstream video object segmentation methods, then transition from 2D video object segmentation to 3D HMR. We further categorize recent HMR methods based on key characteristics that define this research field, including the type of model input and the use of statistical models. Finally, we provide detailed information on HMR datasets and evaluation metrics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Choi, H., Moon, G., Chang, J.Y., Lee, K.M.: Beyond static features for temporally consistent 3d human pose and shape from a video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1964–1973 (2021)
Choi, H., Moon, G., Lee, K.M.: Pose2Mesh: graph convolutional network for 3D human pose and mesh recovery from a 2D human pose. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 769–787. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_45
Doersch, C., Zisserman, A.: Sim2real transfer learning for 3d human pose estimation: motion to the rescue. Adv. Neural Inf. Process. Syst. 32 (2019)
Dwivedi, S.K., Athanasiou, N., Kocabas, M., Black, M.J.: Learning to regress bodies from images using differentiable semantic rendering. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11250–11259 (2021)
Guan, S., Xu, J., He, M.Z., Wang, Y., Ni, B., Yang, X.: Out-of-domain human mesh reconstruction via dynamic bilevel online adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 45(4), 5070–5086 (2022)
Guan, S., Xu, J., Wang, Y., Ni, B., Yang, X.: Bilevel online adaptation for out-of-domain human mesh reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10472–10481 (2021)
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3. 6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)
Joo, H., Neverova, N., Vedaldi, A.: Exemplar fine-tuning for 3d human model fitting towards in-the-wild 3d human pose estimation. In: 2021 International Conference on 3D Vision (3DV), pp. 42–52. IEEE (2021)
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131 (2018)
Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3d human dynamics from video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5614–5623 (2019)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Kocabas, M., Athanasiou, N., Black, M.J.: Vibe: video inference for human body pose and shape estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5253–5263 (2020)
Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2252–2261 (2019)
Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4501–4510 (2019)
Kolotouros, N., Pavlakos, G., Jayaraman, D., Daniilidis, K.: Probabilistic modeling for human mesh recovery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11605–11614 (2021)
Lee, G.H., Lee, S.W.: Uncertainty-aware human mesh recovery from video by learning part-based 3d dynamics. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12375–12384 (2021)
Lin, K., Wang, L., Liu, Z.: End-to-end human pose and mesh reconstruction with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1954–1963 (2021)
Lin, K., Wang, L., Liu, Z.: Mesh graphormer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12939–12948 (2021)
Mehta, D., et al.: Monocular 3d human pose estimation in the wild using improved cnn supervision. In: 2017 International Conference on 3D Vision (3DV), pp. 506–516. IEEE (2017)
Moon, G., Lee, K.M.: I2L-MeshNet: image-to-lixel prediction network for accurate 3D human pose and mesh estimation from a single RGB image. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 752–768. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_44
Nagaraja, N.S., Schmidt, F.R., Brox, T.: Video segmentation with just a few strokes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3235–3243 (2015)
Sun, Y., Ye, Y., Liu, W., Gao, W., Fu, Y., Mei, T.: Human mesh recovery from monocular images via a skeleton-disentangled representation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5349–5358 (2019)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3d human pose in the wild using imus and a moving camera. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 601–617 (2018)
Wu, P., Lu, X., Shen, J., Yin, Y.: Clip fusion with bi-level optimization for human mesh reconstruction from monocular videos. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 105–115 (2023)
Yu, Z., et al.: Skeleton2mesh: kinematics prior injected unsupervised human mesh recovery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8619–8629 (2021)
Zanfir, A., Bazavan, E.G., Zanfir, M., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: Neural descent for visual 3d human pose and shape. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14484–14493 (2021)
Zanfir, M., Zanfir, A., Bazavan, E.G., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: Thundr: transformer-based 3d human reconstruction with markers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12971–12980 (2021)
Zhang, F., Zhu, X., Dai, H., Ye, M., Zhu, C.: Distribution-aware coordinate representation for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7093–7102 (2020)
Zhang, H., et al.: Pymaf: 3d human pose and shape regression with pyramidal mesh alignment feedback loop. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11446–11456 (2021)
Zheng, C., Mendieta, M., Wang, P., Lu, A., Chen, C.: A lightweight graph transformer network for human mesh reconstruction from 2d human pose. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 5496–5507 (2022)
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (No. 62106128, 62101309), the Natural Science Foundation of Shandong Province (No. ZR2021QF001, ZR2021QF109), Shandong Province Science and Technology Small and Medium-sized Enterprise Innovation Capacity Enhancement Project (2023TSGC0115), Shandong Province Higher Education Institutions Youth Entrepreneurship and Technology Support Program (2023KJ027).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors of this paper have no competing interests.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wu, P. et al. (2025). A Review of Human Mesh Reconstruction: Beyond 2D Video Object Segmentation. In: Li, H., et al. Social Robotics. ICSR + InnoBiz 2024. Lecture Notes in Computer Science(), vol 15170. Springer, Singapore. https://doi.org/10.1007/978-981-96-1151-5_17
Download citation
DOI: https://doi.org/10.1007/978-981-96-1151-5_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-1150-8
Online ISBN: 978-981-96-1151-5
eBook Packages: Computer ScienceComputer Science (R0)