Skip to main content

Real Time Hand Pose Estimation Using Depth Sensors

  • Chapter
Consumer Depth Cameras for Computer Vision

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

Abstract

Real-time hand posture capture has been a difficult goal in computer vision. The extraction of hand skeleton parameters would be an important milestone for sign language recognition, since it would make classification of hand shapes and gestures possible. The recent introduction of the Kinect depth sensor has accelerated research in human body pose capture. This chapter describes a real-time hand pose estimation method employing an object recognition by parts approach, and the use of this method for hand shape classification. First, a realistic 3D hand model is used to represent the hand with 21 different parts. Then, a random decision forest (RDF) is trained on synthetic depth images generated by animating the hand model, which is used to perform per pixel classification and to assign each pixel to a hand part. The classification results are fed into a local mode finding algorithm to estimate the joint locations for the hand skeleton. The system can process depth images retrieved from Kinect in real time, and does not rely on temporal information. As a simple application of the system, we also describe a support vector machine (SVM)-based recognition module for the ten digits of American Sign Language (ASL) based on our method, which attains a recognition rate of 99.9 % on live depth images in real time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Microsoft Corp. Redmond, WA. Kinect for Xbox 360

    Google Scholar 

  2. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: IEEE Conference on Computer Vision and Pattern Recognition (2011)

    Google Scholar 

  3. Girshick, R., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.: Efficient regression of general-activity human poses from depth images. In: International Conference on Computer Vision (2011)

    Google Scholar 

  4. Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X.: Vision-based hand pose estimation: a review. Comput. Vis. Image Underst. 108, 52–73 (2007)

    Article  Google Scholar 

  5. Athitsos, V., Sclaroff, S.: Estimating 3D hand pose from a cluttered image. In: IEEE Conference on Computer Vision and Pattern Recognition (2003)

    Google Scholar 

  6. Romero, J., Kjellstrom, H., Kragic, D.: Monocular real-time 3D articulated hand pose estimation. In: Humanoids, pp. 87–92 (2009)

    Google Scholar 

  7. De Campos, T.E., Murray, D.W.: Regression-based hand pose estimation from multiple cameras. In: IEEE Conference on Computer Vision and Pattern Recognition (2006)

    Google Scholar 

  8. Tipping, M.E., Smola, A.: Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1, 211–244 (2001)

    MathSciNet  MATH  Google Scholar 

  9. Rosales, R., Athitsos, V., Sigal, L., Sclaroff, S.: 3D hand pose reconstruction using specialized mappings. In: International Conference on Computer Vision (2001)

    Google Scholar 

  10. Stergiopoulou, E., Papamarkos, N.: Hand gesture recognition using a neural network shape fitting technique. Eng. Appl. Artif. Intell. 22, 1141–1158 (2009)

    Article  Google Scholar 

  11. De La Gorce, M., Fleet, D.J., Paragios, N.: Model-based 3D hand pose estimation from monocular video. IEEE Trans. Pattern Anal. and Mach. Intell., Feb. 1–14 (2011)

    Google Scholar 

  12. Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3D tracking of hand articulations using Kinect. In: British Machine Vision Conference (2011)

    Google Scholar 

  13. Stenger, B., Mendonça, P.R.S., Cipolla, R.: Model-based 3D tracking of an articulated hand. In: IEEE Conference on Computer Vision and Pattern Recognition (2001)

    Google Scholar 

  14. Bray, M., Koller-Meier, E., Van Gool, L.J.: Smart particle filtering for high-dimensional tracking. Comput. Vis. Image Underst. 106, 116–129 (2007)

    Article  Google Scholar 

  15. Heap, T., Hogg, D.: Towards 3D hand tracking using a deformable model. In: International Conference on Automatic Face and Gesture Recognition, pp. 140–145 (1996)

    Google Scholar 

  16. Mo, Z., Neumann, U.: Real-time hand pose recognition using low-resolution depth images. In: IEEE Conference on Computer Vision and Pattern Recognition (2006)

    Google Scholar 

  17. Malassiotis, S., Strintzis, M.: Real-time hand posture recognition using range data. Image Vis. Comput. 26, 1027–1037 (2008)

    Article  Google Scholar 

  18. Liu, X., Fujimura, K.: Hand gesture recognition using depth data. In: Automatic Face and Gesture Recognition (2004)

    Google Scholar 

  19. Suryanarayan, P., Subramanian, A., Mandalapu, D.: Dynamic hand pose recognition using depth data. In: International Conference on Pattern Recognition (2010)

    Google Scholar 

  20. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  21. Uebersax, D., Gall, J., Van den Bergh, M., Van Gool, L.: Real-time sign language letter and word recognition from depth data. In: International Conference on Computer Vision—Workshop on Human Computer Interaction: Real-Time Vision Aspects of Natural User Interfaces (2011)

    Google Scholar 

  22. Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. Pattern Anal. Mach. Intell. 24, 603–619 (2002)

    Article  Google Scholar 

  23. Basak, J.: Online adaptive decision trees: pattern classification and function approximation. Neural Comput. 18, 2062–2101 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  24. Sharp, T.: Implementing decision trees and forests on a GPU. In: European Conference on Computer Vision (2008)

    Google Scholar 

  25. Welch, G., Bishop, G.: An Introduction to the Kalman Filter (1995)

    Google Scholar 

  26. Isard, M., Blake, A.: CONDENSATION—conditional density propagation for visual tracking. Int. J. Comput. Vis. 29, 5–28 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cem Keskin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this chapter

Cite this chapter

Keskin, C., Kıraç, F., Kara, Y.E., Akarun, L. (2013). Real Time Hand Pose Estimation Using Depth Sensors. In: Fossati, A., Gall, J., Grabner, H., Ren, X., Konolige, K. (eds) Consumer Depth Cameras for Computer Vision. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-4640-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4640-7_7

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4639-1

  • Online ISBN: 978-1-4471-4640-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics