LiveHPS++: Robust and Coherent Motion Capture in Dynamic Free Environment

Ren, Yiming; Han, Xiao; Yao, Yichen; Long, Xiaoxiao; Sun, Yujing; Ma, Yuexin

doi:10.1007/978-3-031-73397-0_8

Yiming Ren¹³,
Xiao Han¹³,
Yichen Yao¹³,
Xiaoxiao Long¹⁴,
Yujing Sun¹⁴ &
…
Yuexin Ma¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15087))

Included in the following conference series:

European Conference on Computer Vision

423 Accesses

Abstract

LiDAR-based human motion capture has garnered significant interest in recent years for its practicability in large-scale and unconstrained environments. However, most methods rely on cleanly segmented human point clouds as input, the accuracy and smoothness of their motion results are compromised when faced with noisy data, rendering them unsuitable for practical applications. To address these limitations and enhance the robustness and precision of motion capture with noise interference, we introduce LiveHPS++, an innovative and effective solution based on a single LiDAR system. Benefiting from three meticulously designed modules, our method can learn dynamic and kinematic features from human movements, and further enable the precise capture of coherent human motions in open settings, making it highly applicable to real-world scenarios. Through extensive experiments, LiveHPS++ has proven to significantly surpass existing state-of-the-art methods across various datasets, establishing a new benchmark in the field. https://4dvlab.github.io/project_page/LiveHPS2.html

This work was supported by NSFC (No. 62206173), MoE Key Laboratory of Intelligent Perception and Human-Machine Collaboration (ShanghaiTech University), Shanghai Frontiers Science Center of Human-centered Artificial Intelligence (ShangHAI).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds

Article 21 April 2025

3D Object Detection with a Self-supervised Lidar Scene Flow Backbone

Does it work outside this benchmark? Introducing the rigid depth constructor tool

Article 04 April 2023

References

Amin, S., Andriluka, M., Rohrbach, M., Schiele, B.: Multi-view pictorial structures for 3D human pose estimation. In: BMVC (2009). https://doi.org/10.5244/C.27.45
Baak, A., Müller, M., Bharaj, G., Seidel, H.P., Theobalt, C.: A data-driven approach for real-time full body pose reconstruction from a depth camera. In: ICCV (2011). https://doi.org/10.1109/ICCV.2011.6126356
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14. pp. 561–578. Springer (2016)
Google Scholar
Bregler, C., Malik, J.: Tracking people with twists and exponential maps. In: CVPR (1998). https://doi.org/10.1109/CVPR.1998.698581
Burenius, M., Sullivan, J., Carlsson, S.: 3D pictorial structures for multiple view articulated pose estimation. In: CVPR (2013). https://doi.org/10.1109/CVPR.2013.464
Cai, Z., et al.: Pointhps: cascaded 3d human pose and shape estimation from point clouds. arXiv preprint arXiv:2308.14492 (2023)
Chang, A.X., et al.: Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)
Cong, P., et al.: Stcrowd: a multimodal dataset for pedestrian perception in crowded scenes. In: CVPR, pp. 19608–19617, June 2022. arXiv preprint arXiv:2204.01026 (2022)
Dai, Y., et al.: Sloper4d: a scene-aware dataset for global 4d human pose estimation in urban environments. arXiv preprint arXiv:2303.09095 (2023)
De Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H.P., Thrun, S.: Performance capture from sparse multi-view video. In: ACM SIGGRAPH 2008 Papers, pp. 1–10 (2008)
Google Scholar
Elhayek, A., et al.: Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras. In: CVPR (2015). http://gvv.mpi-inf.mpg.de/projects/convNet_moCap/
Guo, K., et al.: Twinfusion: high framerate non-rigid fusion through fast correspondence tracking. In: 3DV, pp. 596–605 (2018)
Google Scholar
Habermann, M., Xu, W., Zollhöfer, M., Pons-Moll, G., Theobalt, C.: Livecap: real-time human performance capture from monocular video. ACM Trans. Graph. (TOG) 38(2), 14:1–14:17 (2019)
Google Scholar
Habermann, M., Xu, W., Zollhofer, M., Pons-Moll, G., Theobalt, C.: Deepcap: monocular human performance capture using weak supervision. In: CVPR, June 2020
Google Scholar
He, Y., Pang, A., Chen, X., Liang, H., Wu, M., Ma, Y., Xu, L.: Challencap: monocular 3d capture of challenging human performances using multi-modal references. In: CVPR, pp. 11400–11411 (2021)
Google Scholar
Holte, M.B., Tran, C., Trivedi, M.M., Moeslund, T.B.: Human pose estimation and activity recognition from multi-view videos: Comparative explorations of recent developments. JSTSP 6(5), 538–552 (2012). https://doi.org/10.1109/JSTSP.2012.2196975
Article Google Scholar
Huang, Y., et al.: Towards accurate marker-less human shape and pose estimation over time. In: 3DV, pp. 421–430 (2017). https://doi.org/10.1109/3DV.2017.00055
Huang, Y., Kaufmann, M., Aksan, E., Black, M.J., Hilliges, O., Pons-Moll, G.: Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. ACM Trans. Graph. (TOG) 37(6), 1–15 (2018)
Article Google Scholar
Jang, D.K., Yang, D., Jang, D.Y., Choi, B., Jin, T., Lee, S.H.: Movin: real-time motion capture using a single lidar. arXiv preprint arXiv:2309.09314 (2023)
Joo, H., et al.: Panoptic studio: a massively multiview system for social motion capture. In: ICCV (2015). https://doi.org/10.1109/ICCV.2015.381
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR (2018)
Google Scholar
Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3d human dynamics from video. In: CVPR, June 2019
Google Scholar
Kocabas, M., Athanasiou, N., Black, M.J.: Vibe: video inference for human body pose and shape estimation. In: CVPR, June 2020
Google Scholar
Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: CVPR (2019)
Google Scholar
Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: Closing the loop between 3d and 2d human representations. In: CVPR, pp. 6050–6059 (2017)
Google Scholar
Li, J., et al.: Lidarcap: long-range marker-less 3d human motion capture with lidar point clouds. arXiv preprint arXiv:2203.14698 (2022)
Luo, Z., Hachiuma, R., Yuan, Y., Kitani, K.: Dynamics-regulated kinematic policy for egocentric pose estimation. Advances in Neural Information Processing Systems 34 (2021)
Google Scholar
Noitom Motion Capture Systems (2015). https://www.noitom.com/
OptiTrack Motion Capture Systems (2009). https://www.optitrack.com/
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Harvesting multiple views for marker-less 3d human pose annotations. In: CVPR (2017)
Google Scholar
Peng, X., Zhu, X., Ma, Y.: Cl3d: Unsupervised domain adaptation for cross-lidar 3d detection. AAAI (2023)
Google Scholar
Ren, Y., et al.: Livehps: lidar-based scene-level human pose and shape estimation in free environment. arXiv preprint arXiv:2402.17171 (2024)
Ren, Y., et al.: Lidar-aid inertial poser: large-scale human motion capture by sparse inertial and lidar sensors. TVCG (2023)
Google Scholar
Rhodin, H., Robertini, N., Richardt, C., Seidel, H.P., Theobalt, C.: A versatile scene model with differentiable visibility applied to generative pose estimation. In: ICCV (2015). https://doi.org/10.1109/ICCV.2015.94
Robertini, N., Casas, D., Rhodin, H., Seidel, H.P., Theobalt, C.: Model-based outdoor performance capture. In: 3DV (2016). http://gvv.mpi-inf.mpg.de/projects/OutdoorPerfcap/
Shotton, J., et al.: Real-time human pose recognition in parts from single depth images. In: CVPR (2011)
Google Scholar
Sigal, L., Bălan, A.O., Black, M.J.: HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. IJCV (2010). https://doi.org/10.1007/s11263-009-0273-6
Article Google Scholar
Sigal, L., Isard, M., Haussecker, H., Black, M.J.: Loose-limbed people: estimating 3D human pose and motion using non-parametric belief propagation. IJCV 98(1), 15–48 (2012). https://doi.org/10.1007/s11263-011-0493-4
Article MathSciNet Google Scholar
Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: CVPR (2017)
Google Scholar
Stoll, C., Hasler, N., Gall, J., Seidel, H.P., Theobalt, C.: Fast articulated motion tracking using a sums of Gaussians body model. In: ICCV (2011)
Google Scholar
Theobalt, C., de Aguiar, E., Stoll, C., Seidel, H.P., Thrun, S.: Performance capture from multi-view video. In: Image and Geometry Processing for 3-D Cinematography, pp. 127–149. Springer (2010)
Google Scholar
Varol, G., et al.: Learning from synthetic humans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 109–117 (2017)
Google Scholar
Vicon Motion Capture Systems (2010). https://www.vicon.com/
Vlasic, D., et al.: Practical motion capture in everyday surroundings. TOG 26(3), 35–es (2007)
Google Scholar
Von Marcard, T., Rosenhahn, B., Black, M.J., Pons-Moll, G.: Sparse inertial poser: automatic 3d human pose estimation from sparse imus. In: Computer Graphics Forum, vol. 36, pp. 349–360. Wiley Online Library (2017)
Google Scholar
Wei, X., Zhang, P., Chai, J.: Accurate realtime full-body motion capture using a single depth camera. SIGGRAPH Asia 31(6), 188:1–12 (2012)
Google Scholar
Xsens Technologies B.V. (2011). https://www.xsens.com/
Xu, L., Su, Z., Han, L., Yu, T., Liu, Y., Fang, L.: Unstructuredfusion: realtime 4d geometry and texture reconstruction using commercialrgbd cameras. TPAMI, p. 1 (2019)
Google Scholar
Xu, L.: Flycap: markerless motion capture using multiple autonomous flying cameras. TVCG 24(8), 2284–2297 (2018)
Google Scholar
Xu, L., Xu, W., Golyanik, V., Habermann, M., Fang, L., Theobalt, C.: Eventcap: monocular 3d capture of high-speed human motions using an event camera. In: CVPR, June 2020
Google Scholar
Xu, W., et al.: Monoperfcap: human performance capture from monocular video. ACM Trans. Graph. (TOG) 37(2), 27:1–27:15 (2018)
Google Scholar
Xu, Y., et al.: Human-centric scene understanding for 3d large-scale scenarios. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 20349–20359 (2023)
Google Scholar
Yi, X., Zhou, Y., Habermann, M., Shimada, S., Golyanik, V., Theobalt, C., Xu, F.: Physical inertial poser (pip): Physics-aware real-time human motion tracking from sparse inertial sensors. In: CVPR, June 2022
Google Scholar
Yi, X., Zhou, Y., Xu, F.: Transpose: Real-time 3d human translation and pose estimation with six inertial sensors. ACM Trans. Graph. (TOG) 40(4), 1–13 (2021)
Article Google Scholar
Yin, T., Zhou, X., Krähenbühl, P.: Center-based 3d object detection and tracking. CVPR (2021)
Google Scholar
Yu, T., et al.: Doublefusion: real-time capture of human performances with inner body shapes from a single depth sensor. TPAMI (2019)
Google Scholar
Yuan, Y., Kitani, K.: Residual force control for agile human behavior imitation and extended motion synthesis. Adv. Neural. Inf. Process. Syst. 33, 21763–21774 (2020)
Google Scholar
Zanfir, A., Bazavan, E.G., Zanfir, M., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: Neural descent for visual 3d human pose and shape. arXiv preprint arXiv:2008.06910 (2020)
Zhu, X., Ma, Y., Wang, T., Xu, Y., Shi, J., Lin, D.: SSN: shape signature networks for multi-class object detection from point clouds. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 581–597. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_35
Chapter Google Scholar
Zhu, X., et al.: Cylindrical and asymmetrical 3d convolution networks for lidar-based perception. TPAMI (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

ShanghaiTech University, Shanghai, China
Yiming Ren, Xiao Han, Yichen Yao & Yuexin Ma
The University of Hong Kong, Pok Fu Lam, Hong Kong
Xiaoxiao Long & Yujing Sun

Authors

Yiming Ren
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Han
View author publications
You can also search for this author in PubMed Google Scholar
Yichen Yao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxiao Long
View author publications
You can also search for this author in PubMed Google Scholar
Yujing Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yuexin Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuexin Ma .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ren, Y., Han, X., Yao, Y., Long, X., Sun, Y., Ma, Y. (2025). LiveHPS++: Robust and Coherent Motion Capture in Dynamic Free Environment. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15087. Springer, Cham. https://doi.org/10.1007/978-3-031-73397-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-73397-0_8
Published: 03 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73396-3
Online ISBN: 978-3-031-73397-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

LiveHPS++: Robust and Coherent Motion Capture in Dynamic Free Environment