Abstract
For one-shot learning gesture recognition, two important challenges are: how to extract distinctive features and how to learn a discriminative model from only one training sample per gesture class. For feature extraction, a new spatio-temporal feature representation called 3D enhanced motion scale-invariant feature transform is proposed, which fuses RGB-D data. Compared with other features, the new feature set is invariant to scale and rotation, and has more compact and richer visual representations. For learning a discriminative model, all features extracted from training samples are clustered with the k-means algorithm to learn a visual codebook. Then, unlike the traditional bag of feature models using vector quantization (VQ) to map each feature into a certain visual codeword, a sparse coding method named simulation orthogonal matching pursuit (SOMP) is applied and thus each feature can be represented by some linear combination of a small number of codewords. Compared with VQ, SOMP leads to a much lower reconstruction error and achieves better performance. The proposed approach has been evaluated on ChaLearn gesture database and the result has been ranked amongst the top best performing techniques on ChaLearn gesture challenge (round 2).
Editors: Isabelle Guyon and Vassilis Athitsos.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The depth values are normalized to [0 255] in depth videos.
- 2.
MoSIFT and 3D MoSIFT have the same strategy to detect interest points.
- 3.
Here, \(\beta _{1}=0.005\) according to the reference (Ming et al. 2012).
- 4.
Here, \(\beta _{1}=\beta _{2}=0.005\).
References
G. Bradski, The OpenCV Library, Dr. Dobb’s Journal of Software Tools, 2000
M. Brand, N. Oliver, A. Pentland. Coupled hidden markov models for complex action recognition, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1997, pp 994–999
C.C. Chang, C.J. Lin. Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3):27:1–27:27, 2011
F.S. Chen, C.M. Fu, C.L. Huang, Hand gesture recognition using a real-time tracking method and hidden markov models. Image Vis. Comput. 21, 745–758 (2003)
M. Chen, A. Hauptmann. Mosift: Recognizing human actions in surveillance videos. Technical Report, 2009
H. Cooper, E.J. Ong, N. Pugeault, R. Bowden, Sign language recognition using sub-units. J. Mach. Learn. Res. 13, 2205–2231 (2012)
A. Corradini. Dynamic time warping for off-line recognition of a small gesture vocabulary, in IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, 2001, pp. 82–89
N.H. Dardas, N.D. Georganas, Real-time hand gesture detection and recognition using bag-of-features and support vector machine techniques. IEEE Trans. Instrum. Meas. 60(11), 3592–3607 (2011)
P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features, in Proceedings of IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, pp. 65–72
H.J. Escalante, I. Guyon. Principal motion: Pca-based reconstruction of motion histograms. Technical Memorandum, 2012
L. Fei-Fei, P. Perona, A bayesian hierarchical model for learning natural scene categories. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2, 524–531 (2005)
F. Flórez, J.M. GarcÃa, J. GarcÃa, A. Hernández. Hand gesture recognition following the dynamics of a topology-preserving network, in Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, 2002, pp. 318–323
P.-E. Forssen, D.G. Lowe. Shape descriptors for maximally stable extremal regions, in IEEE 11th International Conference on Computer Vision, 2007, pp. 1–8
W.T. Freeman, M. Roth, Orientation histograms for hand gesture recognition. Proc. IEEE Int. Workshop Autom. Face Gesture Recognit. 12, 296–301 (1995)
W. Gao, G. Fang, D. Zhao, Y. Chen, A chinese sign language recognition system based on sofm/srn/hmm. Pattern Recognit. 37(12), 2389–2402 (2004)
T. Guha, R.K. Ward, Learning sparse representations for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1576–1588 (2012)
S. Guo, Z. Wang, Q. Ruan, Enhancing sparsity via \(\ell _{p}\) (0\(<\)p\(<\)1) minimization for robust face recognition. Neurocomputing 99, 592–602 (2013)
I. Guyon, V. Athitsos, P. Jangyodsuk, B. Hamner, and H.J. Escalante. Chalearn gesture challenge: Design and first results, in IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 1–6
I. Guyon, V. Athitsos, P. Jangyodsuk, H.J. Escalante, B. Hamner. Results and analysis of the chalearn gesture challenge 2012. Technical Report, 2013
C. Harris and M. Stephens. A combined corner and edge detector, in Proceedings of Alvey Vision Conference, volume 15, p. 50, 1988
A. Hernández-Vela, M. A. Bautista, X. Perez-Sala, V. Ponce, X. Baró, O. Pujol, C. Angulo, S. Escalera. Bovdw: Bag-of-visual-and-depth-words for gesture recognition. 21st International Conference on Pattern Recognition (ICPR), 2012
D. Kim, J. Song, D. Kim, Simultaneous gesture segmentation and recognition based on forward spotting accumulative hmms. Pattern Recognit. 40(11), 3012–3026 (2007)
I. Laptev, On space-time interest points. Int. J. Comput. Vis. 64(2), 107–123 (2005)
J.F. Lichtenauer, E.A. Hendriks, M.J.T. Reinders, Sign language recognition by combining statistical dtw and independent classification. Pattern Anal. Mach. Intell. IEEE Trans. 30(11), 2040–2046 (2008)
Y. Linde, A. Buzo, R. Gray, An algorithm for vector quantizer design. Commun. IEEE Trans. 28(1), 84–95 (1980)
D.G. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
B.D. Lucas, T. Kanade, et al. An iterative image registration technique with an application to stereo vision, in Proceedings of the 7th International Joint Conference on Artificial Intelligence, 1981
Y.M. Lui, Human gesture recognition on product manifolds. J. Mach. Learn. Res. 13, 3297–3321 (2012)
M.R. Malgireddy, I. Inwogu, V. Govindaraju. A temporal bayesian model for classifying, detecting and localizing activities in video sequences, in IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 43–48
A. Malima, E. Ozgur, M. Çetin. A fast algorithm for vision-based hand gesture recognition for robot control, in Proceedings of IEEE Signal Processing and Communications Applications, 2006, pp. 1–4
Y. Ming, Q. Ruan, A.G. Hauptmann. Activity recognition from rgb-d camera with 3d local spatio-temporal features, in Proceedings of IEEE International Conference on Multimedia and Expo, 2012 pp. 344–349
L.P. Morency, A. Quattoni, T. Darrell. Latent-dynamic discriminative models for continuous gesture recognition, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8
B.A. Olshausen, D.J. Field et al., Sparse coding with an overcomplete basis set: a strategy employed by vi? Vis. Res. 37(23), 3311–3326 (1997)
V.I. Pavlovic, R. Sharma, T.S. Huang, Visual interpretation of hand gestures for human-computer interaction: a review. IEEE Trans. Pattern Anal. Mach. Intell. 19, 677–695 (1997)
A. Rakotomamonjy, Surveying and comparing simultaneous sparse approximation (or group-lasso) algorithms. Signal Process. 91(7), 1505–1526 (2011)
S. Reifinger, F. Wallhoff, M. Ablassmeier, T. Poitschke, and G. Rigoll. Static and dynamic hand-gesture recognition for augmented reality applications, in Proceedings of the 12th International Conference on Human-computer Interaction: Intelligent Multimodal Interaction Environments, 2007, pp.728–737
Y. Ruiduo, S. Sarkar, and B. Loeding. Enhanced level building algorithm for the movement epenthesis problem in sign language recognition, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8
C. Shan, T. Tan, Y. Wei, Real-time hand tracking using a mean shift embedded particle filter. Pattern Recognit. 40(7), 1958–1970 (2007)
X. Shen, G. Hua, L. Williams, Y. Wu, Dynamic hand gesture recognition: an exemplar-based approach from motion divergence fields. Image Vis. Comput. 30(3), 227–235 (2012)
C. Sminchisescu, A. Kanaujia, Zhiguo Li, D. Metaxas. Conditional models for contextual human motion recognition, in Tenth IEEE International Conference on Computer Vision, volume 2, pp. 1808–1815, 2005
H.I. Suk, B.K. Sin, S.W. Lee, Hand gesture recognition based on dynamic bayesian network framework. Pattern Recognit. 43(9), 3059–3072 (2010)
J. Weaver, T. Starner, A. Pentland, Real-time american sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20, 1371–1375 (1998)
J.A. Tropp, A.C. Gilbert, M.J. Strauss, Algorithms for simultaneous sparse approximation. part i: Greedy pursuit. Signal Process. 86(3), 572–588 (2006)
A. Vedaldi, B. Fulkerson. VLFeat: An open and portable library of computer vision algorithms, http://www.vlfeat.org/, 2008
A.J. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. Inf. Theor. IEEE Trans. 13(2), 260–269 (1967)
C. P. Vogler. American Sign Language Recognition: Reducing the Complexity of the Task with Phoneme-based Modeling and Parallel Hidden Markov Models. Ph.D. thesis, Doctoral dissertation, University of Pennsylvania, 2003
J. Wan, Q. Ruan, G. An, W. Li. Gesture recognition based on hidden markov model from sparse representative observations, in IEEE 10th International Conference on Signal Processing (ICSP), 2012, pp. 1180–1183
H. Wang, M.M. Ullah, A. Klaser, I. Laptev, C. Schmid, et al. Evaluation of local spatio-temporal features for action recognition, in Proceedings of British Machine Vision Conference, 2009
J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, Y. Gong. Locality-constrained linear coding for image classification, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 3360–3367
S.B. Wang, A. Quattoni, L.P. Morency, D. Demirdjian, T. Darrell, Hidden conditional random fields for gesture recognition. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2, 1521–1527 (2006)
J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Yi Ma, Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31, 210–227 (2009)
J. Yamato, Jun Ohya, and K. Ishii. Recognizing human action in time-sequential images using hidden markov model, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1992, pp. 379–385
J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1794–1801
M.H. Yang, N. Ahuja, M. Tabb, Extraction of 2d motion trajectories and its application to hand gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 24, 1061–1074 (2002)
D. Youtian, C. Feng, X. Wenli, Li. Yongbin. Recognizing interaction activities using dynamic bayesian network, in 18th International Conference on Pattern Recognition, volume 1, pp. 618–621, 2006
Y. Zhu, G. Xu, D.J. Kriegman, A real-time approach to the spotting, representation, and recognition of hand gestures for human-computer interaction. Comput. Vis. Image Underst. 85(3), 189–208 (2002)
Acknowledgements
We appreciate ChaLearn providing the gesture database (http://chalearn.org) whose directors are gratefully acknowledged. We would like to thank Isabelle Guyon, ChaLearn, Berkeley, California, who gives us insightful comments and suggestions to improve our manuscripts. And we are grateful to editors and anonymous reviewers whose instructive suggestions have improved the quality of this paper. Besides, thanks to acknowledge support for this project from National Natural Science Foundation (60973060, 61003114, 61172128), National 973 plans project (2012CB316304), the fundamental research funds for the central universities (2011JBM020, 2011JBM022) and the program for Innovative Research Team in University of Ministry of Education of China (IRT 201206).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Wan, J., Ruan, Q., Li, W., Deng, S. (2017). One-Shot Learning Gesture Recognition from RGB-D Data Using Bag of Features. In: Escalera, S., Guyon, I., Athitsos, V. (eds) Gesture Recognition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-57021-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-57021-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57020-4
Online ISBN: 978-3-319-57021-1
eBook Packages: Computer ScienceComputer Science (R0)