A Review of Human Mesh Reconstruction: Beyond 2D Video Object Segmentation

Wu, Peng; Wang, Zhicheng; Pan, Feiyu; Li, Fangkai; Hu, Hao; Lu, Xiankai; Guo, Yiyou

doi:10.1007/978-981-96-1151-5_17

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15170))

Included in the following conference series:

International Conference on Social Robotics

7 Accesses

Abstract

Video object segmentation aims to extract 2D object masks by segmenting video frames into multiple objects, which is crucial in various practical applications such as medical imaging, etc.. However, traditional video object segmentation methods produce 2D masks, which are not suitable for 3D scenarios where depth information is essential, such as in robotic grasping, virtual reality, and autonomous driving, etc.. In this paper, we present a comprehensive review of 3D human mesh reconstruction (HMR) as an extension beyond 2D video object segmentation. We begin by reviewing the mainstream video object segmentation methods, then transition from 2D video object segmentation to 3D HMR. We further categorize recent HMR methods based on key characteristics that define this research field, including the type of model input and the use of statistical models. Finally, we provide detailed information on HMR datasets and evaluation metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Choi, H., Moon, G., Chang, J.Y., Lee, K.M.: Beyond static features for temporally consistent 3d human pose and shape from a video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1964–1973 (2021)
Google Scholar
Choi, H., Moon, G., Lee, K.M.: Pose2Mesh: graph convolutional network for 3D human pose and mesh recovery from a 2D human pose. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 769–787. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_45
Chapter Google Scholar
Doersch, C., Zisserman, A.: Sim2real transfer learning for 3d human pose estimation: motion to the rescue. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Dwivedi, S.K., Athanasiou, N., Kocabas, M., Black, M.J.: Learning to regress bodies from images using differentiable semantic rendering. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11250–11259 (2021)
Google Scholar
Guan, S., Xu, J., He, M.Z., Wang, Y., Ni, B., Yang, X.: Out-of-domain human mesh reconstruction via dynamic bilevel online adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 45(4), 5070–5086 (2022)
Article MATH Google Scholar
Guan, S., Xu, J., Wang, Y., Ni, B., Yang, X.: Bilevel online adaptation for out-of-domain human mesh reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10472–10481 (2021)
Google Scholar
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3. 6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)
Google Scholar
Joo, H., Neverova, N., Vedaldi, A.: Exemplar fine-tuning for 3d human model fitting towards in-the-wild 3d human pose estimation. In: 2021 International Conference on 3D Vision (3DV), pp. 42–52. IEEE (2021)
Google Scholar
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131 (2018)
Google Scholar
Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3d human dynamics from video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5614–5623 (2019)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Kocabas, M., Athanasiou, N., Black, M.J.: Vibe: video inference for human body pose and shape estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5253–5263 (2020)
Google Scholar
Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2252–2261 (2019)
Google Scholar
Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4501–4510 (2019)
Google Scholar
Kolotouros, N., Pavlakos, G., Jayaraman, D., Daniilidis, K.: Probabilistic modeling for human mesh recovery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11605–11614 (2021)
Google Scholar
Lee, G.H., Lee, S.W.: Uncertainty-aware human mesh recovery from video by learning part-based 3d dynamics. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12375–12384 (2021)
Google Scholar
Lin, K., Wang, L., Liu, Z.: End-to-end human pose and mesh reconstruction with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1954–1963 (2021)
Google Scholar
Lin, K., Wang, L., Liu, Z.: Mesh graphormer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12939–12948 (2021)
Google Scholar
Mehta, D., et al.: Monocular 3d human pose estimation in the wild using improved cnn supervision. In: 2017 International Conference on 3D Vision (3DV), pp. 506–516. IEEE (2017)
Google Scholar
Moon, G., Lee, K.M.: I2L-MeshNet: image-to-lixel prediction network for accurate 3D human pose and mesh estimation from a single RGB image. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 752–768. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_44
Chapter Google Scholar
Nagaraja, N.S., Schmidt, F.R., Brox, T.: Video segmentation with just a few strokes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3235–3243 (2015)
Google Scholar
Sun, Y., Ye, Y., Liu, W., Gao, W., Fu, Y., Mei, T.: Human mesh recovery from monocular images via a skeleton-disentangled representation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5349–5358 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3d human pose in the wild using imus and a moving camera. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 601–617 (2018)
Google Scholar
Wu, P., Lu, X., Shen, J., Yin, Y.: Clip fusion with bi-level optimization for human mesh reconstruction from monocular videos. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 105–115 (2023)
Google Scholar
Yu, Z., et al.: Skeleton2mesh: kinematics prior injected unsupervised human mesh recovery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8619–8629 (2021)
Google Scholar
Zanfir, A., Bazavan, E.G., Zanfir, M., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: Neural descent for visual 3d human pose and shape. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14484–14493 (2021)
Google Scholar
Zanfir, M., Zanfir, A., Bazavan, E.G., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: Thundr: transformer-based 3d human reconstruction with markers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12971–12980 (2021)
Google Scholar
Zhang, F., Zhu, X., Dai, H., Ye, M., Zhu, C.: Distribution-aware coordinate representation for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7093–7102 (2020)
Google Scholar
Zhang, H., et al.: Pymaf: 3d human pose and shape regression with pyramidal mesh alignment feedback loop. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11446–11456 (2021)
Google Scholar
Zheng, C., Mendieta, M., Wang, P., Lu, A., Chen, C.: A lightweight graph transformer network for human mesh reconstruction from 2d human pose. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 5496–5507 (2022)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (No. 62106128, 62101309), the Natural Science Foundation of Shandong Province (No. ZR2021QF001, ZR2021QF109), Shandong Province Science and Technology Small and Medium-sized Enterprise Innovation Capacity Enhancement Project (2023TSGC0115), Shandong Province Higher Education Institutions Youth Entrepreneurship and Technology Support Program (2023KJ027).

Author information

Authors and Affiliations

School of Software, Shandong University, Jinan, 250000, Shandong, China
Peng Wu, Zhicheng Wang, Feiyu Pan, Fangkai Li, Hao Hu & Xiankai Lu
School of Mathematics and Computer Science, Fujian Provincial Key Laboratory of Data-Intensive Computing, Fujian University Laboratory of Intelligent Computing and Information Processing, Quanzhou Normal University, Quanzhou, China
Yiyou Guo

Authors

Peng Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zhicheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Feiyu Pan
View author publications
You can also search for this author in PubMed Google Scholar
Fangkai Li
View author publications
You can also search for this author in PubMed Google Scholar
Hao Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xiankai Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yiyou Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiankai Lu .

Editor information

Editors and Affiliations

The Chinese University of Hong Kong, Shenzhen, China
Haizhou Li
University of Bremen, Bremen, Germany
Tanja Schultz
Shenzhen Institute of Advanced Technology, Shenzhen, China
Yalei Bi
The Chinese University of Hong Kong, Shenzhen, China
Jian Zhu
The University of Alabama, Tuscaloosa, AL, USA
Hongsheng He
The Hong Kong University of Science, Guangzhou, China
Jun Ma
National University of Singapore, Singapore, Singapore
Siqi Cai
Qingdao University, Qingdao, China
Wanyue Jiang
National University of Singapore, Singapore, Singapore
Shuzhi Sam Ge

Ethics declarations

Disclosure of Interests

The authors of this paper have no competing interests.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, P. et al. (2025). A Review of Human Mesh Reconstruction: Beyond 2D Video Object Segmentation. In: Li, H., et al. Social Robotics. ICSR + InnoBiz 2024. Lecture Notes in Computer Science(), vol 15170. Springer, Singapore. https://doi.org/10.1007/978-981-96-1151-5_17

Download citation

DOI: https://doi.org/10.1007/978-981-96-1151-5_17
Published: 07 February 2025
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-1150-8
Online ISBN: 978-981-96-1151-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Review of Human Mesh Reconstruction: Beyond 2D Video Object Segmentation