Abstract
This paper proposes a novel multi-layered gesture recognition method with Kinect. We explore the essential linguistic characters of gestures: the components concurrent character and the sequential organization character, in a multi-layered framework, which extracts features from both the segmented semantic units and the whole gesture sequence and then sequentially classifies the motion, location and shape components. In the first layer, an improved principle motion is applied to model the motion component. In the second layer, a particle-based descriptor and a weighted dynamic time warping are proposed for the location component classification. In the last layer, the spatial path warping is further proposed to classify the shape component represented by unclosed shape context. The proposed method can obtain relatively high performance for one-shot learning gesture recognition on the ChaLearn Gesture Dataset comprising more than 50,000 gesture sequences recorded with Kinect.
Editors: Sergio Escalera, Isabelle Guyon and Vassilis Athitsos
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In this paper, we use the term “gesture sequence” to mean an image sequence that contains only one complete gesture and “multi-gesture sequence” to mean an image sequence which may contain one or multiple gesture sequences.
- 2.
Available at http://gesture.chalearn.org/data/sample-code.
References
T. Agrawal, S. Chaudhuri, Gesture recognition using motion histogram, in Proceedings of the Indian National Conference of Communications, 2003, pp. 438–442
O. Al-Jarrah, A. Halawani, Recognition of gestures in Arabic sign language using neuro-fuzzy systems. Artif. Intell. 133(1), 117–138 (2001)
G. Awad, J. Han, A. Sutherland, A unified system for segmentation and tracking of face and hands in sign language recognition, in Proceedings of the 18th International Conference on Pattern Recognition, vol. 1, 2006, pp. 239–242
M. Baklouti, E. Monacelli, V. Guitteny, S. Couvet, Intelligent assistive exoskeleton with vision based interface, in Proceedings of the 5th International Conference On Smart Homes and Health Telematics, 2008, pp. 123–135
B. Bauer, K.-F. Kraiss, Video-based sign recognition using self-organizing subunits, in Proceedings of the 16th International Conference on Pattern Recognition, vol. 2, 2002, pp. 434–437
S. Belongie, J. Malik, J. Puzicha, Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002)
Q. Cai, D. Gallup, C. Zhang, Z. Zhang, 3D deformable face tracking with a commodity depth camera, in Proceedings of the 11th European Conference on Computer Vision, 2010, pp. 229–242
X. Chen, M. Koskela, Online RGB-D gesture recognition with extreme learning machines, in Proceedings of the 15th ACM International Conference on Multimodal Interaction, 2013, pp. 467–474
H. Cooper, B. Holt, R. Bowden, Sign language recognition, in Visual Analysis of Humans, 2011, pp. 539–562
H. Cooper, E.-J. Ong, N. Pugeault, R. Bowden, Sign language recognition using sub-units. J. Mach. Learn. Res. 13, 2205–2231 (2012)
A. Corradini, Real-time gesture recognition by means of hybrid recognizers, in Proceedings of International Gesture Workshop on Gesture and Sign Languages in Human-Computer Interaction, 2002, pp. 34–47
R. Cutler, M. Turk, View-based interpretation of real-time optical flow for gesture recognition, in Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, 1998, pp. 416–416
N. Dardas, Real-time hand gesture detection and recognition for human computer interaction, Ph.D. thesis, University of Ottawa, 2012
P. Doliotis, A. Stefan, C. Mcmurrough, D. Eckhard, V. Athitsos, Comparing gesture recognition accuracy using color and depth information, in Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments, 2011, p. 20
J. Edmonds, Maximum matching and a polyhedron with 0, 1-vertices. J. Res. Natl. Bur. Stand. B 69, 125–130 (1965)
H. Ershaed, I. Al-Alali, N. Khasawneh, M. Fraiwan, An Arabic sign language computer interface using the Xbox Kinect, in Proceedings of the Annual Undergraduate Research Conference on Applied Computing, vol. 1, 2011
H. Escalante, I. Guyon, Principal Motion, 2012, http://www.causality.inf.ethz.ch/Gesture/principal_motion.pdf
H.J. Escalante, I. Guyon, V. Athitsos, P. Jangyodsuk, J. Wan, Principal motion components for gesture recognition using a single-example, 2013, arXiv:1310.4822
S.R. Fanello, I. Gori, G. Metta, F. Odone, One-shot learning for real-time action recognition, in Pattern Recognition and Image Analysis, 2013, pp. 31–40
G. Fang, W. Gao, D. Zhao, Large vocabulary sign language recognition based on fuzzy decision trees. IEEE Trans. Syst. Man Cybern. A 34(3), 305–314 (2004)
A. Fornés, S. Escalera, J. Lladós, E. Valveny, Symbol classification using dynamic aligned shape descriptor, in Proceedings of the 20th International Conference on Pattern Recognition, 2010, pp. 1957–1960
I. Guyon, V. Athitsos, P. Jangyodsuk, H.J. Escalante, B. Hamner, Results and analysis of the Chalearn gesture challenge 2012, in Proceedings of International Workshop on Advances in Depth Image Analysis and Applications, 2013, pp. 186–204
I. Guyon, V. Athitsos, P. Jangyodsuk, H.J. Escalante, The chalearn gesture dataset (CGD 2011). Mach. Vis. Appl. 25(8), 1929–1951 (2014). doi:10.1007/s00138-014-0596-3
C.-L. Huang, W.-Y. Huang, Sign language recognition using model-based tracking and a 3D Hopfield neural network. Mach. Vis. Appl. 10(5–6), 292–307 (1998)
G.-B. Huang, H. Zhou, X. Ding, R. Zhang, Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. B 42(2), 513–529 (2012)
V.I Levenshtein, Binary codes capable of correcting deletions, insertions and reversals, in Soviet Physics Doklady, vol. 10, 1966, p. 707
Y.-S. Jeong, M.K. Jeong, O.A. Omitaomu, Weighted dynamic time warping for time series classification. Pattern Recognit. 44(9), 2231–2240 (2011)
T. Kadir, R. Bowden, E.J. Ong, A. Zisserman, Minimal training, large lexicon, unconstrained sign language recognition, in Proceedings of the British Machine Vision Conference, vol. 1, 2004, pp. 1–10
H.W. Kuhn, The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2(1–2), 83–97 (1955)
J.F. Lichtenauer, E.A. Hendriks, M.J.T. Reinders, Sign language recognition by combining statistical DTW and independent classification. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 2040–2046 (2008)
S.K. Liddell, R.E. Johnson, American sign language. Sign Lang. Stud. 64, 195–278 (1989)
Y.M. Lui, Human gesture recognition on product manifolds. J. Mach. Learn. Res. 13(1), 3297–3321 (2012a)
Y.M. Lui, A least squares regression framework on manifolds and its application to gesture recognition, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012b, pp. 13–18
U. Mahbub, T. Roy, M.S. Rahman, H. Imtiaz, One-shot-learning gesture recognition using motion history based gesture silhouettes, in Proceedings of the International Conference on Industrial Application Engineering, 2013, pp. 186–193
M.R. Malgireddy, I. Inwogu, V. Govindaraju, A temporal Bayesian model for classifying, detecting and localizing activities in video sequences, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 43–48
M. Maraqa, R. Abu-Zaiter, Recognition of Arabic Sign Language (ArSL) using recurrent neural networks, in Proceedings of the First International Conference on the Applications of Digital Information and Web Technologies, 2008, pp. 478–481
T.H.H. Maung, Real-time hand tracking and gesture recognition system using neural networks. World Acad. Sci. Eng. Technol. 50, 466–470 (2009)
S. Mitra, T. Acharya, Gesture recognition: a survey. IEEE Trans. Syst. Man Cybern. C 37(3), 311–324 (2007)
S. Mu-Chun, A fuzzy rule-based approach to spatio-temporal hand gesture recognition. IEEE Trans. Syst. Man Cybern. C 30(2), 276–281 (2000)
K. Nickel, R. Stiefelhagen, Visual recognition of pointing gestures for human-robot interaction. Image Vis. Comput. 25(12), 1875–1884 (2007)
I. Oikonomidis, N. Kyriazis, A. Argyros, Efficient model-based 3D tracking of hand articulations using Kinect, in Proceedings of the British Machine Vision Conference, 2011, pp. 1–11
E.-J. Ong, R. Bowden, A boosted classifier tree for hand shape detection, in Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004, pp. 889–894
A. Ramamoorthy, N. Vaswani, S. Chaudhury, S. Banerjee, Recognition of dynamic hand gestures. Pattern Recognit. 36(9), 2069–2081 (2003)
I. Rauschert, P. Agrawal, R. Sharma, S. Fuhrmann, I. Brewer, A. MacEachren, Designing a human-centered, multimodal GIS interface to support emergency management, in Proceedings of the 10th ACM International Symposium on Advances in Geographic Information Systems, 2002, pp. 119–124
Z. Ren, J. Yuan, J. Meng, Z. Zhang, Robust part-based hand gesture recognition using Kinect sensor. IEEE Trans. Multimed. 15(5), 1110–1120 (2013)
M. Reyes, G. Dominguez, S. Escalera, Feature weighting in dynamic time warping for gesture recognition in depth data, in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2011, pp. 1182–1188
Y. Sabinas, E.F. Morales, H.J. Escalante, A one-shot DTW-based method for early gesture recognition, in Proceedings of 18th Iberoamerican Congress on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, 2013, pp. 439–446
H.J. Seo, P. Milanfar, Action recognition from one example. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 867–882 (2011)
L. Shao, L. Ji, Motion histogram analysis based key frame extraction for human action/activity representation, in Proceedings of Canadian Conference on Computer and Robot Vision, 2009, pp. 88–92
J. Shotton, A.W. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, A. Blake, Real-time human pose recognition in parts from single depth images, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 1297–1304
E. Stergiopoulou, N. Papamarkos, Hand gesture recognition using a neural network shape fitting technique. Eng. Appl. Artif. Intell. 22(8), 1141–1158 (2009)
W.C. Stokoe, Sign language structure: an outline of the visual communication systems of the American deaf. Studies in Linguistics, Occasional Papers, 8, 1960
C.P. Vogler, American Sign Language recognition: reducing the complexity of the task with phoneme-based modeling and parallel hidden Markov models, Ph.D. thesis, University of Pennsylvania, 2003
C. Vogler, D. Metaxas, Parallel hidden Markov models for American Sign Language recognition, in Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 1 1999, pp. 116–122
J. Wachs, M. Kolsch, H. Stem, Y. Edan, Vision-based hand-gesture applications. Commun. ACM 54(2), 60–71 (2011)
J. Wan, Q. Ruan, G. An, W. Li, Gesture recognition based on hidden Markov model from sparse representative observations, in Proceedings of the IEEE 11th International Conference on Signal Processing, vol. 2 2012a, pp. 1180–1183
J. Wan, Q. Ruan, G. An, W. Li, Hand tracking and segmentation via graph cuts and dynamic model in sign language videos, in Proceedings of IEEE 11th International Conference on Signal Processing, vol. 2 (IEEE, Piscataway, 2012b), pp. 1135–1138
J. Wan, Q. Ruan, W. Li, S. Deng, One-shot learning gesture recognition from RGB-D data using bag of features. J. Mach. Learn. Res. 14(1), 2549–2582 (2013)
J. Wan, V. Athitsos, P. Jangyodsuk, H.J. Escalante, Q. Ruan, I. Guyon, CSMMI: class-specific maximization of mutual information for action and gesture recognition. IEEE Trans. Image Process. 23(7), 3152–3165 (2014a)
J. Wan, Q. Ruan, W. Li, G. An, R. Zhao, 3D SMoSIFT: three-dimensional sparse motion scale invariant feature transform for activity recognition from RGB-D videos. J. Electron. Imaging 23(2), 023017 (2014b)
C. Wang, W. Gao, S. Shan, An approach based on phonemes to large vocabulary Chinese sign language recognition, in Proceedings of the IEEE Conference on Automatic Face and Gesture Recognition, 2002, pp. 411–416
J. Wang, Z. Liu, Y. Wu, J. Yuan, Mining actionlet ensemble for action recognition with depth cameras, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 1290–1297
S.-F. Wong, T.-K. Kim, R. Cipolla, Learning motion categories using both semantic and structural information, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–6
D. Wu, F. Zhu, L. Shao, One shot learning gesture recognition from RGBD images, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012a, pp. 7–12
S. Wu, F. Jiang, D. Zhao, S. Liu, W. Gao, Viewpoint-independent hand gesture recognition system, in Proceedings of the IEEE Conference on Visual Communications and Image Processing, 2012b, pp. 43–48
M. Zahedi, D. Keysers, H. Ney, Appearance-based recognition of words in american sign language, in Proceedings of Second Iberian Conference on Pattern recognition and image analysis, 2005, pp. 511–519
L.-G. Zhang, Y. Chen, G. Fang, X. Chen, W. Gao, A vision-based sign language recognition system using tied-mixture density HMM, in Proceedings of the 6th International Conference on Multimodal Interfaces, 2004, pp. 198–204
Acknowledgements
We would like to acknowledge the editors and reviewers, whose valuable comments greatly improved the manuscript. Specially, we would also like to thank Escalante and Guyon who kindly provided us the principal motion source code and Microsoft Asian who kindly provided two sets of Kinect devices. This work was supported in part by the Major State Basic Research Development Program of China (973 Program 2015CB351804) and the National Natural Science Foundation of China under Grant No. 61272386, 61100096 and 61300111.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Jiang, F., Zhang, S., Wu, S., Gao, Y., Zhao, D. (2017). Multi-layered Gesture Recognition with Kinect. In: Escalera, S., Guyon, I., Athitsos, V. (eds) Gesture Recognition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-57021-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-57021-1_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57020-4
Online ISBN: 978-3-319-57021-1
eBook Packages: Computer ScienceComputer Science (R0)