skip to main content
research-article

TransPose: real-time 3D human translation and pose estimation with six inertial sensors

Published:19 July 2021Publication History
Skip Abstract Section

Abstract

Motion capture is facing some new possibilities brought by the inertial sensing technologies which do not suffer from occlusion or wide-range recordings as vision-based solutions do. However, as the recorded signals are sparse and quite noisy, online performance and global translation estimation turn out to be two key difficulties. In this paper, we present TransPose, a DNN-based approach to perform full motion capture (with both global translations and body poses) from only 6 Inertial Measurement Units (IMUs) at over 90 fps. For body pose estimation, we propose a multi-stage network that estimates leaf-to-full joint positions as intermediate results. This design makes the pose estimation much easier, and thus achieves both better accuracy and lower computation cost. For global translation estimation, we propose a supporting-foot-based method and an RNN-based method to robustly solve for the global translations with a confidence-based fusion technique. Quantitative and qualitative comparisons show that our method outperforms the state-of-the-art learning- and optimization-based methods with a large margin in both accuracy and efficiency. As a purely inertial sensor-based approach, our method is not limited by environmental settings (e.g., fixed cameras), making the capture free from common difficulties such as wide-range motion space and strong occlusion.

Skip Supplemental Material Section

Supplemental Material

a86-yi.mp4

mp4

84.1 MB

3450626.3459786.mp4

Presentation.

mp4

384.5 MB

References

  1. David Aha. 1997. Lazy Learning.Google ScholarGoogle Scholar
  2. Sheldon Andrews, Ivan Huerta, Taku Komura, Leonid Sigal, and Kenny Mitchell. 2016. Real-time Physics-based Motion Capture with Sparse Sensors. 1--10.Google ScholarGoogle Scholar
  3. E.R. Bachmann, Robert Mcghee, Xiaoping Yun, and Michael Zyda. 2002. Inertial and Magnetic Posture Tracking for Inserting Humans Into Networked Virtual Environments. (01 2002).Google ScholarGoogle Scholar
  4. Long Chen, Haizhou Ai, Rui Chen, Zijie Zhuang, and Shuang Liu. 2020. Cross-View Tracking for Multi-Human 3D Pose Estimation at Over 100 FPS. 3276--3285.Google ScholarGoogle Scholar
  5. Michael Del Rosario, Heba Khamis, Phillip Ngo, and Nigel Lovell. 2018. Computationally-Efficient Adaptive Error-State Kalman Filter for Attitude Estimation. IEEE Sensors Journal PP (08 2018), 1--1.Google ScholarGoogle ScholarCross RefCross Ref
  6. Tamar Flash and Neville Hogan. 1985. The Coordination of Arm Movements: An Experimentally Confirmed Mathematical Model. The Journal of neuroscience : the official journal of the Society for Neuroscience 5 (08 1985), 1688--703.Google ScholarGoogle Scholar
  7. Eric Foxlin. 1996. Inertial Head-Tracker Sensor Fusion by a Complementary Separate-Bias Kalman Filter. 185 -- 194, 267.Google ScholarGoogle Scholar
  8. Andrew Gilbert, Matthew Trumble, Charles Malleson, Adrian Hilton, and John Collomosse. 2018. Fusing Visual and Inertial Sensors with Semantics for 3D Human Pose Estimation. International Journal of Computer Vision (09 2018), 1--17.Google ScholarGoogle Scholar
  9. Ikhsanul Habibie, Weipeng Xu, Dushyant Mehta, Gerard Pons-Moll, and Christian Theobalt. 2019. In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations. 10897--10906.Google ScholarGoogle Scholar
  10. Julius Hannink, Thomas Kautz, Cristian Pasluosta, Jochen Klucken, and Bjoern Eskofier. 2016. Sensor-Based Gait Parameter Extraction With Deep Convolutional Neural Networks. IEEE Journal of Biomedical and Health Informatics PP (09 2016).Google ScholarGoogle Scholar
  11. Thomas Helten, Meinard Müller, Hans-Peter Seidel, and Christian Theobalt. 2013. Real-Time Body Tracking with One Depth Camera and Inertial Sensors. Proceedings of the IEEE International Conference on Computer Vision, 1105--1112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Roberto Henschel, Timo Marcard, and Bodo Rosenhahn. 2020. Accurate Long-Term Multiple People Tracking using Video and Body-Worn IMUs. IEEE Transactions on Image Processing PP (08 2020), 1--1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-term Memory. Neural computation 9 (12 1997), 1735--80.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Daniel Holden. 2018. Robust solving of optical motion capture data by denoising. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yinghao Huang, Manuel Kaufmann, Emre Aksan, Michael Black, Otmar Hilliges, and Gerard Pons-Moll. 2018. Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. ACM Transactions on Graphics 37, 1--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Diederik Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations (12 2014).Google ScholarGoogle Scholar
  17. Huajun Liu, Xiaolin Wei, Jinxiang Chai, Inwoo Ha, and Taehyun Rhee. 2011. Realtime human motion control with a small number of inertial sensors. 133--140.Google ScholarGoogle Scholar
  18. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A Skinned Multi-Person Linear Model. ACM Trans. Graphics (Proc. SIGGRAPH Asia) 34, 6 (Oct. 2015), 248:1--248:16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Gerard Pons-Moll, and Michael J. Black. 2019. AMASS: Archive of Motion Capture as Surface Shapes. In The IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  20. Charles Malleson, John Collomosse, and Adrian Hilton. 2019. Real-Time Multi-person Motion Capture from Multi-view Video and IMUs. International Journal of Computer Vision (12 2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Charles Malleson, Andrew Gilbert, Matthew Trumble, John Collomosse, Adrian Hilton, and Marco Volino. 2017. Real-Time Full-Body Motion Capture from Video and IMUs. 449--457.Google ScholarGoogle Scholar
  22. Timo Marcard, Gerard Pons-Moll, and Bodo Rosenhahn. 2016. Human Pose Estimation from Video and IMUs. IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (02 2016), 1--1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Timo Marcard, Bodo Rosenhahn, Michael Black, and Gerard Pons-Moll. 2017. Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs. Computer Graphics Forum 36(2), Proceedings of the 38th Annual Conference of the European Association for Computer Graphics (Eurographics), 2017 36 (02 2017).Google ScholarGoogle Scholar
  24. Dushyant Mehta, Oleksandr Sotnychenko, Franziska Mueller, Weipeng Xu, Mohamed Elgharib, Pascal Fua, Hans-Peter Seidel, Helge Rhodin, Gerard Pons-Moll, and Christian Theobalt. 2020. XNect: real-time multi-person 3D motion capture with a single RGB camera. ACM Transactions on Graphics 39 (07 2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Thomas B Moeslund and Erik Granum. 2001. A survey of computer vision-based human motion capture. Computer vision and image understanding 81, 3 (2001), 231--268.Google ScholarGoogle Scholar
  26. Thomas B Moeslund, Adrian Hilton, and Volker Krüger. 2006. A survey of advances in vision-based human motion capture and analysis. Computer vision and image understanding 104, 2-3 (2006), 90--126.Google ScholarGoogle Scholar
  27. Gerard Pons-Moll, Andreas Baak, Juergen Gall, Laura Leal-Taixé, Meinard Müller, Hans-Peter Seidel, and Bodo Rosenhahn. 2011. Outdoor human motion capture using inverse kinematics and von mises-fisher sampling. Proceedings of the IEEE International Conference on Computer Vision 0, 1243--1250.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Gerard Pons-Moll, Andreas Baak, Thomas Helten, Meinard Müller, Hans-Peter Seidel, and Bodo Rosenhahn. 2010. Multisensor-Fusion for 3D Full-Body Human Motion Capture. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 663--670.Google ScholarGoogle ScholarCross RefCross Ref
  29. Qaiser Riaz, Tao Guanhong, Björn Krüger, and Andreas Weber. 2015. Motion Reconstruction Using Very Few Accelerometers and Ground Contacts. Graphical Models (04 2015).Google ScholarGoogle Scholar
  30. Daniel Roetenberg, Hendrik Luinge, Chris Baten, and Peter Veltink. 2005. Compensation of Magnetic Disturbances Improves Inertial and Magnetic Sensing of Human Body Segment Orientation. Neural Systems and Rehabilitation Engineering, IEEE Transactions on 13 (10 2005), 395 -- 405.Google ScholarGoogle Scholar
  31. Martin Schepers, Matteo Giuberti, and G. Bellusci. 2018. Xsens MVN: Consistent Tracking of Human Motion Using Inertial Sensing. (03 2018).Google ScholarGoogle Scholar
  32. Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing 45, 11 (1997), 2673--2681.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Loren Schwarz, Diana Mateus, and Nassir Navab. 2009. Discriminative Human Full-Body Pose Estimation from Wearable Inertial Sensor Data. 159--172.Google ScholarGoogle Scholar
  34. Soshi Shimada, Vladislav Golyanik, Weipeng Xu, and Christian Theobalt. 2020. PhysCap: physically plausible monocular 3D motion capture in real time. ACM Transactions on Graphics 39 (11 2020), 1--16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ronit Slyper and Jessica Hodgins. 2008. Action Capture with Accelerometers. ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 193--199.Google ScholarGoogle Scholar
  36. Jochen Tautges, Arno Zinke, Björn Krüger, Jan Baumann, Andreas Weber, Thomas Helten, Meinard Müller, Hans-Peter Seidel, and Bernhard Eberhardt. 2011. Motion Reconstruction Using Sparse Accelerometer Data. ACM Transactions on Graphics 30 (05 2011), 18.Google ScholarGoogle Scholar
  37. Denis Tome, Matteo Toso, Lourdes Agapito, and Chris Russell. 2018. Rethinking Pose in 3D: Multi-stage Refinement and Recovery for Markerless Motion Capture. 474--483.Google ScholarGoogle Scholar
  38. Matthew Trumble, Andrew Gilbert, Adrian Hilton, and John Collomosse. 2016. Deep Convolutional Networks for Marker-less Human Pose Estimation from Multiple Views. 1--9.Google ScholarGoogle Scholar
  39. Matthew Trumble, Andrew Gilbert, Charles Malleson, Adrian Hilton, and John Collomosse. 2017. Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors.Google ScholarGoogle Scholar
  40. Rachel Vitali, Ryan McGinnis, and Noel Perkins. 2020. Robust Error-State Kalman Filter for Estimating IMU Orientation. IEEE Sensors Journal (10 2020).Google ScholarGoogle ScholarCross RefCross Ref
  41. Daniel Vlasic, Rolf Adelsberger, Giovanni Vannucci, John Barnwell, Markus Gross, Wojciech Matusik, and Jovan Popovic. 2007. Practical motion capture in everyday surroundings. ACM Trans. Graph. 26 (07 2007), 35.Google ScholarGoogle Scholar
  42. Timo von Marcard, Roberto Henschel, Michael J. Black, Bodo Rosenhahn, and Gerard Pons-Moll. 2018. Recovering Accurate 3D Human Pose in the Wild Using IMUs and a Moving Camera. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part X (Lecture Notes in Computer Science), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.), Vol. 11214. 614--631.Google ScholarGoogle Scholar
  43. Donglai Xiang, Hanbyul Joo, and Yaser Sheikh. 2019. Monocular Total Capture: Posing Face, Body, and Hands in the Wild. 10957--10966.Google ScholarGoogle Scholar
  44. Lan Xu, Lu Fang, Wei Cheng, Kaiwen Guo, Guyue Zhou, Qionghai Dai, and Yebin Liu. 2016. FlyCap: Markerless Motion Capture Using Multiple Autonomous Flying Cameras. IEEE Transactions on Visualization and Computer Graphics PP (10 2016).Google ScholarGoogle Scholar
  45. Zhe Zhang, Chunyu Wang, Wenhu Qin, and Wenjun Zeng. 2020. Fusing Wearable IMUs With Multi-View Images for Human Pose Estimation: A Geometric Approach. 2197--2206.Google ScholarGoogle Scholar
  46. Zerong Zheng, Yu Tao, Hao Li, Kaiwen Guo, Qionghai Dai, Lu Fang, and Yebin Liu. 2018. HybridFusion: Real-Time Performance Capture Using a Single Depth Sensor and Sparse IMUs: 15th European Conference, Munich, Germany, September 8--14, 2018, Proceedings, Part IX. 389--406.Google ScholarGoogle Scholar
  47. Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. 2018. On the Continuity of Rotation Representations in Neural Networks. CoRR abs/1812.07035 (2018). arXiv:1812.07035Google ScholarGoogle Scholar
  48. Yuxiao Zhou, Marc Habermann, Ikhsanul Habibie, Ayush Tewari, Christian Theobalt, and Feng Xu. 2020a. Monocular Real-time Full Body Capture with Inter-part Correlations.Google ScholarGoogle Scholar
  49. Yuxiao Zhou, Marc Habermann, Weipeng Xu, Ikhsanul Habibie, Christian Theobalt, and Feng Xu. 2020b. Monocular Real-Time Hand Shape and Motion Capture Using Multi-Modal Data. 5345--5354.Google ScholarGoogle Scholar
  50. Yuliang Zou, Jimei Yang, Duygu Ceylan, Jianming Zhang, Federico Perazzi, and Jia-Bin Huang. 2020. Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints. 448--457.Google ScholarGoogle Scholar

Index Terms

  1. TransPose: real-time 3D human translation and pose estimation with six inertial sensors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 40, Issue 4
      August 2021
      2170 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3450626
      Issue’s Table of Contents

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 July 2021
      Published in tog Volume 40, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader