Skip to main content
Log in

Tracking human-like natural motion by combining two deep recurrent neural networks with Kalman filter

  • Original Research Paper
  • Published:
Intelligent Service Robotics Aims and scope Submit manuscript

Abstract

The Kinect skeleton tracker can achieve considerable performance with human body tracking in a convenient and low-cost manner. However, the tracker often captures unnatural human poses, such as discontinuous and vibrational movement when self-occlusions occur. In this study, we propose an advanced post-processing method to improve the Kinect skeleton using a single Kinect sensor, in which a combination of probabilistic filtering techniques and supervised learning techniques is employed to correct unnatural tracking movements. Specifically, two deep recurrent neural networks are used to improve joint velocities, as well as joint positions produced by the Kinect skeleton tracker. Moreover, a classic Kalman filter further refines positions and velocities. In addition, we propose a novel measure to evaluate the naturalness of captured joint trajectories. We evaluated the proposed approach by comparing it to ground truth obtained using a commercial optical maker-based motion capture system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, and Blake A (2011) Real-time human pose recognition in parts from single depth images. In: International conference on computer vision and pattern recognition (CVPR)

  2. Rumelhart D, Hinton G, Williams R (1986) Learning representations by backpropagating errors. Nature 323(6088):533–536

    Article  Google Scholar 

  3. Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127

    Article  Google Scholar 

  4. Goodfellow I, Warde-Farley D, Mirza M, Courville A, and Bengio Y (2013) Maxout networks. In: ICML

  5. Le Roux N, Bengio Y (2010) Deep belief networks are compact universal approximators. Neural Comput 22(8):2192–2207

    Article  MathSciNet  Google Scholar 

  6. Delalleau O. and Bengio Y (2011) Shallow vs. deep sum-product networks. In: NIPS

  7. Krizhevsky A, Sutskever, and Hinton G (2012) ImageNet classification with deep convolutional neural networks. In: NIPS

  8. Hochreiter S, Schmidhuber J (1997) Long short-term memory? Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  9. Park S, Trivedi M (2008) Understanding human interactions with track and body synergies (TBS) captured from multiple views. Comput Vis Image Understand 111(1):2–20

    Article  Google Scholar 

  10. Ziegler J, Nickel K, and Stiefelhagen R (2006) Tracking of the articulated upper body on multi-view stereo image sequences. In: Proceedings computer vision and pattern recognition

  11. Hofmann M, Gavrila D (2011) Multi-view 3D human pose estimation in complex environment. Int J Comput Vis 96(1):103–124

    Article  MathSciNet  Google Scholar 

  12. Baak A, Muller M, Bharaj G, Seidel H.-P, and Theobalt C (2011) A data-driven approach for real-time full body pose reconstruction from a depth camera. In: ICCV, pp 1092–1099

  13. Zhang Q, Song X, Shao X, Shibasaki R, Zhao H (2013) ‘Unsupervised skeleton extraction and motion capture from 3D deformable matching. Neurocomputing 100:170–182

    Article  Google Scholar 

  14. Zhang L, Sturm J, Cremers D, and Lee D. (2012) Real-time human motion tracking using multiple depth cameras. In: Proceedings of the international conference on intelligent robot systems (IROS)

  15. Liu Y, Gall J, Stoll C, Dai Q, Seidel H-P, Theobalt C (2013) Markerless motion capture of multiple characters using multi-view image segmentation. IEEE Trans Pattern Anal Mach Intell 35(11):2720–2735

    Article  Google Scholar 

  16. Masse J-T, Lerasle F, Devy M, Monin A, Lefebvre O, Mas S (2013) Human motion capture using data fusion of multiple skeleton data. ACIVS, volume 8192 of lecture notes in computer science. Springer, Berlin, pp 126–137

    Google Scholar 

  17. Moon S, Park Y, Ko DW, Suh IH (2016) Multiple kinect sensor fusion for human skeleton tracking using Kalman filtering. Int J Adv Robot Syst 13:65

    Article  Google Scholar 

  18. Yeung KY, Kwok TH, Wang CL (2013) Improved Skeleton tracking by duplex kinects: a practical approach for real-time applications. J Comput Inf Sci Eng 13(4):1–10

    Article  Google Scholar 

  19. Flash T, Hogan N (1985) The coordination of arm movements: an experimentally confirmed mathematical model? J Neurosci 5(7):1688–1703

    Article  Google Scholar 

  20. Thobbi A, Gu Y, and Sheng W (2011) Using human motion estimation for human–robot cooperative manipulation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

  21. Corteville B. Aertbelien E, Bruyninckx H, De Schutter J, and Van Brussel H (2007) Human-inspired robot assistant for fast point-to-point movements? In: IEEE international conference on robotics and automation

  22. Lv F, and Nevatia R (2006) Recognition and segmentation of 3-d human action using hmm and multi-class adaboost. In: ECCV, pp 359–372

  23. Wang Q, Kurillo G, Ofli F, and Bajcsy R (2015) Evaluation of pose tracking accuracy in the first and second generations of Microsoft Kinect. In: 2015 international conference on healthcare informatics (ICHI). IEEE

  24. Liu DC, Nocedal J (1989) On the limited memory method for large scale optimization. Math Program B 45(3):503–528

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by the Technology Innovation Industrial Program funded by the Ministry of Trade, (MI, South Korea) [10073161 & 10048320, Technology Innovation Program], as well as by Institute for Information & communications Technology Promotion (IITP) grant funded by MSIT (No. 2018-0-00622).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Il Hong Suh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 7559 KB)

Supplementary material 2 (mp4 4990 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, J.B., Park, Y. & Suh, I. . Tracking human-like natural motion by combining two deep recurrent neural networks with Kalman filter. Intel Serv Robotics 11, 313–322 (2018). https://doi.org/10.1007/s11370-018-0255-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11370-018-0255-z

Keywords

Navigation