Skip to main content
Log in

Real-time estimation of hand gestures based on manifold learning from monocular videos

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Object pose estimation by manifold learning has become a hot research area recently. In this paper, we propose an efficient method that can recover pose and viewpoints for numerous hand gestures from monocular videos based on Locality Preserving Projections. We first select some hand dynamic gestures as primitive hand motions and set a 3D-2D mapping table to relate 3D joint angles of sampling static pose with their projective silhouettes from arbitrary viewpoints. Then the embedding space and explicit mapping function are learnt for every primitive motion. In order to make classification and prediction among those embedding spaces, a Subspace Filtering Algorithm is also proposed which can recognize and recover numerous hand dynamic gestures by the combination of primitive gestures. At last, by using skin color cues and oriented k-Dops, multi-hands can be labeled and tracked separately and accurately. Extensive experimental results demonstrate qualitatively and quantitatively that 3D pose recovery of hands can be achieved by our method robustly and efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Abdelkader MF, Abd-Almageed W, Srivastava A, Chellapp R (2011) Silhouette-based gesture and action recognition via modeling trajectories on Riemannian shape manifolds[J]. Comp Vision Image Underst 115(3):439–455

    Article  Google Scholar 

  2. Alvarez-Alvarez A, Cordon O (2012) Human gait modeling using a genetic fuzzy finite state machine [J]. IEEE Trans Fuzzy Syst 20(2):205–223

    Article  Google Scholar 

  3. Argyros AA; Lourakis MIA (2004) Real-time tracking of multiple skin-colored objects with a possibly moving camera[C], European Conference on Computer Vision, Springer Berlin Heidelberg, ECCV2004, LNCS 3023:368–379

  4. Athitsos V, Sclaroff S (2003) Estimating 3D hand pose from a cluttered image[C]. In proceeding of IEEE Conference on Computer Vision and Pattern Recognition, CVPR2003, Vol.2(2) 432–439

  5. Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection [J]. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

    Article  Google Scholar 

  6. Belkin M, Niyogi P (2003) Laplacian Eigenmaps for dimensionality reduction and data representation [J]. Neural Comput 15(6):1373–1396

    Article  MATH  Google Scholar 

  7. Cai D, He X, Han J, Zhang H-J (2006) Orthogonal laplacianfaces for face recognition [J]. IEEE Trans Image Process 15(11):3608–3614

    Article  Google Scholar 

  8. Cobes S, Ferre M, Uran MA (2008) Efficient human hand kinematics for manipulation tasks[C], International conference on Intelligence Robots and Systems, 2246–2251

  9. Dadgostar F, Barczak ALC, Sarrafzadeh A (2005) A color hand gesture database for evaluating and improving algorithms on hand gesture and posture recognition [J]. Res Lett Inf Math Sci 7:127–134

    Google Scholar 

  10. Elmezain M, Al-Hamadi A, Appenrodt J et al (2008) A hidden markov model-based continuous gesture recognition system for hand motion trajectory[C], 19th International Conference on Pattern Recognition, ICPR 2008, 1–4

  11. Erol A, Bebis G, Nicolescu M, Boyle RD, Twombly X (2007) Vision-based hand pose estimation: a review. Computer Vision and Image Understanding[J]. In Special Issue on Vision for Human-Computer Interaction Vol. 108(1–2):52–73

  12. Ge SS, Yang Y, Lee TH (2008) Hand gesture recognition and tracking based on distributed locally linear embedding[J]. Image Vis Comput 26(12):1607–1620

    Article  Google Scholar 

  13. Hasan MM, Mishra PK (2012) Hand gesture modeling and recognition using geometric features: a review[J]. Can J Image Process Comput Vision 3(1):12–26

    Google Scholar 

  14. He X, Niyogi P (2002) Locality preserving projection, technical report, TR-2002-09, Department of Computer Science, the University of Chicago

  15. He X, Yan S, Hu Y, Zhang H (2003) Learning a Locality Preserving Subspace for Visual Recognition[C]. In Proceedings of IEEE International Conference on Computer Vision Vol.1:385–392

  16. Hu MK (1962) Visual pattern recognition by moment invariants[J]. IRE Trans Inf Theory 8(2):179–187

    Article  MATH  Google Scholar 

  17. Hurst W, Wezel C (2013) Gesture-based interaction via finger tracking for mobile augmented reality[J]. Multimed Tools Appl 62:233–258

    Article  Google Scholar 

  18. Ibraheem NA, Khan RZ (2012) Vision based gesture recognition using neural networks approaches: a review[J]. Int J Hum Comput Interact IJHCI 3(1):1–12

    Google Scholar 

  19. Junejo IN, Dexter E, Laptev I, Pérez P (2011) View-independent action recognition from temporal self-similarities[J]. IEEE Trans Pattern Anal Mach Intell 33(1):172–185

    Article  Google Scholar 

  20. Khan R, Hanbury A, Stöttinger J, Bais A (2012) Color based skin classification[J]. Pattern Recognit Lett 33(2):157–163

    Article  Google Scholar 

  21. Kim T-K, Wong S-F, Cipolla R (2007) Tensor canonical correlation analysis for action classification[C], In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 1–8

  22. Li W, Deng C (2012) Fast and robust method for dynamic gesture recognition using hermite neural network[J]. J Comput 7(5):1163–1168

    MathSciNet  Google Scholar 

  23. Martinez AM, Kak AC (2001) PCA versus LDA[J]. IEEE Trans Pattern Anal Mach Intell 23(2):228–233

    Article  Google Scholar 

  24. Moeslund TB, Hilton A, Krüger V (2006) A survey of advances in vision-based human motion capture and analysis[J]. Comput Vision Image Underst 104(2–3):90–126

    Article  Google Scholar 

  25. Mugavin ME (2008) Multidimensional scaling: a brief overview [J]. Nurs Res 57(1):64–68

    Article  Google Scholar 

  26. Oikonomidis I, Kyriazis N, Argyros A (2011) Efficient model-based 3d tracking of hand articulations using kinect [C]. Br Mach Vis Conf 101.1–101.11

    Google Scholar 

  27. Roccetti M, Marfia G, Semeraro A (2012) Playing into the wild: a gesture-based interface for gaming in public spaces[J]. J Vis Commun Image Represent 23(3):426–440

    Article  Google Scholar 

  28. Romero J, Kjellstrom H, Kragic D (2009) Monocular real-time 3D articulated hand pose estimation[C]. IEEE-RAS Int’l Conf Humanoid Robot :87–92

  29. Rosales R, Athitsos V, Sigal L, Sclaroff S (2001) 3d hand pose reconstruction using specialized mappings[C]. IEEE Int Conf Comput Vis ICCV 1(1):378–385

    Google Scholar 

  30. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding[J]. Science 290(5500):2323–2326

    Article  Google Scholar 

  31. Song Y, Tang S, Zheng YT et al (2012) Exploring probabilistic localized video representation for human action recognition [J]. Multimed Tools Appl 58(3):663–685

    Article  Google Scholar 

  32. Stenger B, Mendonça PRS, Cipolla R (2001) Model-based 3D tracking of an articulated hand[C], Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, Vol. 2:990–976. doi:10.1109/CVPR.2001.990976

  33. Takahashi M, Fujii M, Naemura M et al (2013) Human gesture recognition system for TV viewing using time-of-flight camera[J]. Multimed Tools Appl 62(3):761–783

    Article  Google Scholar 

  34. Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction[J]. Science 290(5500):2319–2323

    Article  Google Scholar 

  35. Vezhnevets V, Sazonov V, Andreeva A (2007) A survey on pixel-based skin color detection techniques[J]. Pattern Recog 40(3):1106–1122

    Article  Google Scholar 

  36. Wang X, Xia M, Cai H, Gao Y, Cattani C (2012) Hidden-Markov-Models-Based Dynamic Hand Gesture Recognition[J]. Math Probl Eng, Vol 2012, Article ID 986134.11

  37. Yen S-H, Wu C-M, Wang H-Z (2012) A block-based orthogonal locality preserving projection method for face super-resolution[J]. Intell Inf Database Syst Lect Notes Comput Sci 7197:253–262

    Article  Google Scholar 

  38. Zachmann G (1998) Rapid Collision Detection by Dynamically Aligned DOP-trees[C]. In Proc. IEEE Virtual Reality Annual International Symposium, 90–97

  39. Zhang Z, Wang J, Zha H (2012) Adaptive manifold learning[J]. IEEE Trans Pattern Anal Mach Intell 34(2):253–265

    Article  Google Scholar 

Download references

Acknowledgments

The research activities as described in this paper were funded by Doctor Startup Fund of Liaoning Province, China (20111023), the National Natural Science Funds of China (61033012, 61003177, 61272371, 11171052 and 61173104), and the program for New Century Excellent Talents (NCET-11-0048) and Specialized Research Fund for the Doctoral Program of Higher Education (20120041120050).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Fan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Luo, Z., Liu, J. et al. Real-time estimation of hand gestures based on manifold learning from monocular videos. Multimed Tools Appl 71, 555–574 (2014). https://doi.org/10.1007/s11042-013-1524-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1524-7

Keywords

Navigation