Abstract
Object pose estimation by manifold learning has become a hot research area recently. In this paper, we propose an efficient method that can recover pose and viewpoints for numerous hand gestures from monocular videos based on Locality Preserving Projections. We first select some hand dynamic gestures as primitive hand motions and set a 3D-2D mapping table to relate 3D joint angles of sampling static pose with their projective silhouettes from arbitrary viewpoints. Then the embedding space and explicit mapping function are learnt for every primitive motion. In order to make classification and prediction among those embedding spaces, a Subspace Filtering Algorithm is also proposed which can recognize and recover numerous hand dynamic gestures by the combination of primitive gestures. At last, by using skin color cues and oriented k-Dops, multi-hands can be labeled and tracked separately and accurately. Extensive experimental results demonstrate qualitatively and quantitatively that 3D pose recovery of hands can be achieved by our method robustly and efficiently.
Similar content being viewed by others
References
Abdelkader MF, Abd-Almageed W, Srivastava A, Chellapp R (2011) Silhouette-based gesture and action recognition via modeling trajectories on Riemannian shape manifolds[J]. Comp Vision Image Underst 115(3):439–455
Alvarez-Alvarez A, Cordon O (2012) Human gait modeling using a genetic fuzzy finite state machine [J]. IEEE Trans Fuzzy Syst 20(2):205–223
Argyros AA; Lourakis MIA (2004) Real-time tracking of multiple skin-colored objects with a possibly moving camera[C], European Conference on Computer Vision, Springer Berlin Heidelberg, ECCV2004, LNCS 3023:368–379
Athitsos V, Sclaroff S (2003) Estimating 3D hand pose from a cluttered image[C]. In proceeding of IEEE Conference on Computer Vision and Pattern Recognition, CVPR2003, Vol.2(2) 432–439
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection [J]. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Belkin M, Niyogi P (2003) Laplacian Eigenmaps for dimensionality reduction and data representation [J]. Neural Comput 15(6):1373–1396
Cai D, He X, Han J, Zhang H-J (2006) Orthogonal laplacianfaces for face recognition [J]. IEEE Trans Image Process 15(11):3608–3614
Cobes S, Ferre M, Uran MA (2008) Efficient human hand kinematics for manipulation tasks[C], International conference on Intelligence Robots and Systems, 2246–2251
Dadgostar F, Barczak ALC, Sarrafzadeh A (2005) A color hand gesture database for evaluating and improving algorithms on hand gesture and posture recognition [J]. Res Lett Inf Math Sci 7:127–134
Elmezain M, Al-Hamadi A, Appenrodt J et al (2008) A hidden markov model-based continuous gesture recognition system for hand motion trajectory[C], 19th International Conference on Pattern Recognition, ICPR 2008, 1–4
Erol A, Bebis G, Nicolescu M, Boyle RD, Twombly X (2007) Vision-based hand pose estimation: a review. Computer Vision and Image Understanding[J]. In Special Issue on Vision for Human-Computer Interaction Vol. 108(1–2):52–73
Ge SS, Yang Y, Lee TH (2008) Hand gesture recognition and tracking based on distributed locally linear embedding[J]. Image Vis Comput 26(12):1607–1620
Hasan MM, Mishra PK (2012) Hand gesture modeling and recognition using geometric features: a review[J]. Can J Image Process Comput Vision 3(1):12–26
He X, Niyogi P (2002) Locality preserving projection, technical report, TR-2002-09, Department of Computer Science, the University of Chicago
He X, Yan S, Hu Y, Zhang H (2003) Learning a Locality Preserving Subspace for Visual Recognition[C]. In Proceedings of IEEE International Conference on Computer Vision Vol.1:385–392
Hu MK (1962) Visual pattern recognition by moment invariants[J]. IRE Trans Inf Theory 8(2):179–187
Hurst W, Wezel C (2013) Gesture-based interaction via finger tracking for mobile augmented reality[J]. Multimed Tools Appl 62:233–258
Ibraheem NA, Khan RZ (2012) Vision based gesture recognition using neural networks approaches: a review[J]. Int J Hum Comput Interact IJHCI 3(1):1–12
Junejo IN, Dexter E, Laptev I, Pérez P (2011) View-independent action recognition from temporal self-similarities[J]. IEEE Trans Pattern Anal Mach Intell 33(1):172–185
Khan R, Hanbury A, Stöttinger J, Bais A (2012) Color based skin classification[J]. Pattern Recognit Lett 33(2):157–163
Kim T-K, Wong S-F, Cipolla R (2007) Tensor canonical correlation analysis for action classification[C], In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 1–8
Li W, Deng C (2012) Fast and robust method for dynamic gesture recognition using hermite neural network[J]. J Comput 7(5):1163–1168
Martinez AM, Kak AC (2001) PCA versus LDA[J]. IEEE Trans Pattern Anal Mach Intell 23(2):228–233
Moeslund TB, Hilton A, Krüger V (2006) A survey of advances in vision-based human motion capture and analysis[J]. Comput Vision Image Underst 104(2–3):90–126
Mugavin ME (2008) Multidimensional scaling: a brief overview [J]. Nurs Res 57(1):64–68
Oikonomidis I, Kyriazis N, Argyros A (2011) Efficient model-based 3d tracking of hand articulations using kinect [C]. Br Mach Vis Conf 101.1–101.11
Roccetti M, Marfia G, Semeraro A (2012) Playing into the wild: a gesture-based interface for gaming in public spaces[J]. J Vis Commun Image Represent 23(3):426–440
Romero J, Kjellstrom H, Kragic D (2009) Monocular real-time 3D articulated hand pose estimation[C]. IEEE-RAS Int’l Conf Humanoid Robot :87–92
Rosales R, Athitsos V, Sigal L, Sclaroff S (2001) 3d hand pose reconstruction using specialized mappings[C]. IEEE Int Conf Comput Vis ICCV 1(1):378–385
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding[J]. Science 290(5500):2323–2326
Song Y, Tang S, Zheng YT et al (2012) Exploring probabilistic localized video representation for human action recognition [J]. Multimed Tools Appl 58(3):663–685
Stenger B, Mendonça PRS, Cipolla R (2001) Model-based 3D tracking of an articulated hand[C], Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, Vol. 2:990–976. doi:10.1109/CVPR.2001.990976
Takahashi M, Fujii M, Naemura M et al (2013) Human gesture recognition system for TV viewing using time-of-flight camera[J]. Multimed Tools Appl 62(3):761–783
Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction[J]. Science 290(5500):2319–2323
Vezhnevets V, Sazonov V, Andreeva A (2007) A survey on pixel-based skin color detection techniques[J]. Pattern Recog 40(3):1106–1122
Wang X, Xia M, Cai H, Gao Y, Cattani C (2012) Hidden-Markov-Models-Based Dynamic Hand Gesture Recognition[J]. Math Probl Eng, Vol 2012, Article ID 986134.11
Yen S-H, Wu C-M, Wang H-Z (2012) A block-based orthogonal locality preserving projection method for face super-resolution[J]. Intell Inf Database Syst Lect Notes Comput Sci 7197:253–262
Zachmann G (1998) Rapid Collision Detection by Dynamically Aligned DOP-trees[C]. In Proc. IEEE Virtual Reality Annual International Symposium, 90–97
Zhang Z, Wang J, Zha H (2012) Adaptive manifold learning[J]. IEEE Trans Pattern Anal Mach Intell 34(2):253–265
Acknowledgments
The research activities as described in this paper were funded by Doctor Startup Fund of Liaoning Province, China (20111023), the National Natural Science Funds of China (61033012, 61003177, 61272371, 11171052 and 61173104), and the program for New Century Excellent Talents (NCET-11-0048) and Specialized Research Fund for the Doctoral Program of Higher Education (20120041120050).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, Y., Luo, Z., Liu, J. et al. Real-time estimation of hand gestures based on manifold learning from monocular videos. Multimed Tools Appl 71, 555–574 (2014). https://doi.org/10.1007/s11042-013-1524-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1524-7