Skip to main content

LiveHPS++: Robust and Coherent Motion Capture in Dynamic Free Environment

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15087))

Included in the following conference series:

  • 423 Accesses

Abstract

LiDAR-based human motion capture has garnered significant interest in recent years for its practicability in large-scale and unconstrained environments. However, most methods rely on cleanly segmented human point clouds as input, the accuracy and smoothness of their motion results are compromised when faced with noisy data, rendering them unsuitable for practical applications. To address these limitations and enhance the robustness and precision of motion capture with noise interference, we introduce LiveHPS++, an innovative and effective solution based on a single LiDAR system. Benefiting from three meticulously designed modules, our method can learn dynamic and kinematic features from human movements, and further enable the precise capture of coherent human motions in open settings, making it highly applicable to real-world scenarios. Through extensive experiments, LiveHPS++ has proven to significantly surpass existing state-of-the-art methods across various datasets, establishing a new benchmark in the field. https://4dvlab.github.io/project_page/LiveHPS2.html

This work was supported by NSFC (No. 62206173), MoE Key Laboratory of Intelligent Perception and Human-Machine Collaboration (ShanghaiTech University), Shanghai Frontiers Science Center of Human-centered Artificial Intelligence (ShangHAI).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Amin, S., Andriluka, M., Rohrbach, M., Schiele, B.: Multi-view pictorial structures for 3D human pose estimation. In: BMVC (2009). https://doi.org/10.5244/C.27.45

  2. Baak, A., Müller, M., Bharaj, G., Seidel, H.P., Theobalt, C.: A data-driven approach for real-time full body pose reconstruction from a depth camera. In: ICCV (2011). https://doi.org/10.1109/ICCV.2011.6126356

  3. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14. pp. 561–578. Springer (2016)

    Google Scholar 

  4. Bregler, C., Malik, J.: Tracking people with twists and exponential maps. In: CVPR (1998). https://doi.org/10.1109/CVPR.1998.698581

  5. Burenius, M., Sullivan, J., Carlsson, S.: 3D pictorial structures for multiple view articulated pose estimation. In: CVPR (2013). https://doi.org/10.1109/CVPR.2013.464

  6. Cai, Z., et al.: Pointhps: cascaded 3d human pose and shape estimation from point clouds. arXiv preprint arXiv:2308.14492 (2023)

  7. Chang, A.X., et al.: Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)

  8. Cong, P., et al.: Stcrowd: a multimodal dataset for pedestrian perception in crowded scenes. In: CVPR, pp. 19608–19617, June 2022. arXiv preprint arXiv:2204.01026 (2022)

  9. Dai, Y., et al.: Sloper4d: a scene-aware dataset for global 4d human pose estimation in urban environments. arXiv preprint arXiv:2303.09095 (2023)

  10. De Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H.P., Thrun, S.: Performance capture from sparse multi-view video. In: ACM SIGGRAPH 2008 Papers, pp. 1–10 (2008)

    Google Scholar 

  11. Elhayek, A., et al.: Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras. In: CVPR (2015). http://gvv.mpi-inf.mpg.de/projects/convNet_moCap/

  12. Guo, K., et al.: Twinfusion: high framerate non-rigid fusion through fast correspondence tracking. In: 3DV, pp. 596–605 (2018)

    Google Scholar 

  13. Habermann, M., Xu, W., Zollhöfer, M., Pons-Moll, G., Theobalt, C.: Livecap: real-time human performance capture from monocular video. ACM Trans. Graph. (TOG) 38(2), 14:1–14:17 (2019)

    Google Scholar 

  14. Habermann, M., Xu, W., Zollhofer, M., Pons-Moll, G., Theobalt, C.: Deepcap: monocular human performance capture using weak supervision. In: CVPR, June 2020

    Google Scholar 

  15. He, Y., Pang, A., Chen, X., Liang, H., Wu, M., Ma, Y., Xu, L.: Challencap: monocular 3d capture of challenging human performances using multi-modal references. In: CVPR, pp. 11400–11411 (2021)

    Google Scholar 

  16. Holte, M.B., Tran, C., Trivedi, M.M., Moeslund, T.B.: Human pose estimation and activity recognition from multi-view videos: Comparative explorations of recent developments. JSTSP 6(5), 538–552 (2012). https://doi.org/10.1109/JSTSP.2012.2196975

    Article  Google Scholar 

  17. Huang, Y., et al.: Towards accurate marker-less human shape and pose estimation over time. In: 3DV, pp. 421–430 (2017). https://doi.org/10.1109/3DV.2017.00055

  18. Huang, Y., Kaufmann, M., Aksan, E., Black, M.J., Hilliges, O., Pons-Moll, G.: Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. ACM Trans. Graph. (TOG) 37(6), 1–15 (2018)

    Article  Google Scholar 

  19. Jang, D.K., Yang, D., Jang, D.Y., Choi, B., Jin, T., Lee, S.H.: Movin: real-time motion capture using a single lidar. arXiv preprint arXiv:2309.09314 (2023)

  20. Joo, H., et al.: Panoptic studio: a massively multiview system for social motion capture. In: ICCV (2015). https://doi.org/10.1109/ICCV.2015.381

  21. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR (2018)

    Google Scholar 

  22. Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3d human dynamics from video. In: CVPR, June 2019

    Google Scholar 

  23. Kocabas, M., Athanasiou, N., Black, M.J.: Vibe: video inference for human body pose and shape estimation. In: CVPR, June 2020

    Google Scholar 

  24. Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: CVPR (2019)

    Google Scholar 

  25. Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: Closing the loop between 3d and 2d human representations. In: CVPR, pp. 6050–6059 (2017)

    Google Scholar 

  26. Li, J., et al.: Lidarcap: long-range marker-less 3d human motion capture with lidar point clouds. arXiv preprint arXiv:2203.14698 (2022)

  27. Luo, Z., Hachiuma, R., Yuan, Y., Kitani, K.: Dynamics-regulated kinematic policy for egocentric pose estimation. Advances in Neural Information Processing Systems 34 (2021)

    Google Scholar 

  28. Noitom Motion Capture Systems (2015). https://www.noitom.com/

  29. OptiTrack Motion Capture Systems (2009). https://www.optitrack.com/

  30. Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Harvesting multiple views for marker-less 3d human pose annotations. In: CVPR (2017)

    Google Scholar 

  31. Peng, X., Zhu, X., Ma, Y.: Cl3d: Unsupervised domain adaptation for cross-lidar 3d detection. AAAI (2023)

    Google Scholar 

  32. Ren, Y., et al.: Livehps: lidar-based scene-level human pose and shape estimation in free environment. arXiv preprint arXiv:2402.17171 (2024)

  33. Ren, Y., et al.: Lidar-aid inertial poser: large-scale human motion capture by sparse inertial and lidar sensors. TVCG (2023)

    Google Scholar 

  34. Rhodin, H., Robertini, N., Richardt, C., Seidel, H.P., Theobalt, C.: A versatile scene model with differentiable visibility applied to generative pose estimation. In: ICCV (2015). https://doi.org/10.1109/ICCV.2015.94

  35. Robertini, N., Casas, D., Rhodin, H., Seidel, H.P., Theobalt, C.: Model-based outdoor performance capture. In: 3DV (2016). http://gvv.mpi-inf.mpg.de/projects/OutdoorPerfcap/

  36. Shotton, J., et al.: Real-time human pose recognition in parts from single depth images. In: CVPR (2011)

    Google Scholar 

  37. Sigal, L., Bălan, A.O., Black, M.J.: HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. IJCV (2010). https://doi.org/10.1007/s11263-009-0273-6

    Article  Google Scholar 

  38. Sigal, L., Isard, M., Haussecker, H., Black, M.J.: Loose-limbed people: estimating 3D human pose and motion using non-parametric belief propagation. IJCV 98(1), 15–48 (2012). https://doi.org/10.1007/s11263-011-0493-4

    Article  MathSciNet  Google Scholar 

  39. Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: CVPR (2017)

    Google Scholar 

  40. Stoll, C., Hasler, N., Gall, J., Seidel, H.P., Theobalt, C.: Fast articulated motion tracking using a sums of Gaussians body model. In: ICCV (2011)

    Google Scholar 

  41. Theobalt, C., de Aguiar, E., Stoll, C., Seidel, H.P., Thrun, S.: Performance capture from multi-view video. In: Image and Geometry Processing for 3-D Cinematography, pp. 127–149. Springer (2010)

    Google Scholar 

  42. Varol, G., et al.: Learning from synthetic humans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 109–117 (2017)

    Google Scholar 

  43. Vicon Motion Capture Systems (2010). https://www.vicon.com/

  44. Vlasic, D., et al.: Practical motion capture in everyday surroundings. TOG 26(3), 35–es (2007)

    Google Scholar 

  45. Von Marcard, T., Rosenhahn, B., Black, M.J., Pons-Moll, G.: Sparse inertial poser: automatic 3d human pose estimation from sparse imus. In: Computer Graphics Forum, vol. 36, pp. 349–360. Wiley Online Library (2017)

    Google Scholar 

  46. Wei, X., Zhang, P., Chai, J.: Accurate realtime full-body motion capture using a single depth camera. SIGGRAPH Asia 31(6), 188:1–12 (2012)

    Google Scholar 

  47. Xsens Technologies B.V. (2011). https://www.xsens.com/

  48. Xu, L., Su, Z., Han, L., Yu, T., Liu, Y., Fang, L.: Unstructuredfusion: realtime 4d geometry and texture reconstruction using commercialrgbd cameras. TPAMI, p. 1 (2019)

    Google Scholar 

  49. Xu, L.: Flycap: markerless motion capture using multiple autonomous flying cameras. TVCG 24(8), 2284–2297 (2018)

    Google Scholar 

  50. Xu, L., Xu, W., Golyanik, V., Habermann, M., Fang, L., Theobalt, C.: Eventcap: monocular 3d capture of high-speed human motions using an event camera. In: CVPR, June 2020

    Google Scholar 

  51. Xu, W., et al.: Monoperfcap: human performance capture from monocular video. ACM Trans. Graph. (TOG) 37(2), 27:1–27:15 (2018)

    Google Scholar 

  52. Xu, Y., et al.: Human-centric scene understanding for 3d large-scale scenarios. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 20349–20359 (2023)

    Google Scholar 

  53. Yi, X., Zhou, Y., Habermann, M., Shimada, S., Golyanik, V., Theobalt, C., Xu, F.: Physical inertial poser (pip): Physics-aware real-time human motion tracking from sparse inertial sensors. In: CVPR, June 2022

    Google Scholar 

  54. Yi, X., Zhou, Y., Xu, F.: Transpose: Real-time 3d human translation and pose estimation with six inertial sensors. ACM Trans. Graph. (TOG) 40(4), 1–13 (2021)

    Article  Google Scholar 

  55. Yin, T., Zhou, X., Krähenbühl, P.: Center-based 3d object detection and tracking. CVPR (2021)

    Google Scholar 

  56. Yu, T., et al.: Doublefusion: real-time capture of human performances with inner body shapes from a single depth sensor. TPAMI (2019)

    Google Scholar 

  57. Yuan, Y., Kitani, K.: Residual force control for agile human behavior imitation and extended motion synthesis. Adv. Neural. Inf. Process. Syst. 33, 21763–21774 (2020)

    Google Scholar 

  58. Zanfir, A., Bazavan, E.G., Zanfir, M., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: Neural descent for visual 3d human pose and shape. arXiv preprint arXiv:2008.06910 (2020)

  59. Zhu, X., Ma, Y., Wang, T., Xu, Y., Shi, J., Lin, D.: SSN: shape signature networks for multi-class object detection from point clouds. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 581–597. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_35

    Chapter  Google Scholar 

  60. Zhu, X., et al.: Cylindrical and asymmetrical 3d convolution networks for lidar-based perception. TPAMI (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuexin Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ren, Y., Han, X., Yao, Y., Long, X., Sun, Y., Ma, Y. (2025). LiveHPS++: Robust and Coherent Motion Capture in Dynamic Free Environment. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15087. Springer, Cham. https://doi.org/10.1007/978-3-031-73397-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73397-0_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73396-3

  • Online ISBN: 978-3-031-73397-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics