Skip to main content

Multi-layered Gesture Recognition with Kinect

  • Chapter
  • First Online:

Part of the book series: The Springer Series on Challenges in Machine Learning ((SSCML))

Abstract

This paper proposes a novel multi-layered gesture recognition method with Kinect. We explore the essential linguistic characters of gestures: the components concurrent character and the sequential organization character, in a multi-layered framework, which extracts features from both the segmented semantic units and the whole gesture sequence and then sequentially classifies the motion, location and shape components. In the first layer, an improved principle motion is applied to model the motion component. In the second layer, a particle-based descriptor and a weighted dynamic time warping are proposed for the location component classification. In the last layer, the spatial path warping is further proposed to classify the shape component represented by unclosed shape context. The proposed method can obtain relatively high performance for one-shot learning gesture recognition on the ChaLearn Gesture Dataset comprising more than 50,000 gesture sequences recorded with Kinect.

Editors: Sergio Escalera, Isabelle Guyon and Vassilis Athitsos

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In this paper, we use the term “gesture sequence” to mean an image sequence that contains only one complete gesture and “multi-gesture sequence” to mean an image sequence which may contain one or multiple gesture sequences.

  2. 2.

    Available at http://gesture.chalearn.org/data/sample-code.

References

  • T. Agrawal, S. Chaudhuri, Gesture recognition using motion histogram, in Proceedings of the Indian National Conference of Communications, 2003, pp. 438–442

    Google Scholar 

  • O. Al-Jarrah, A. Halawani, Recognition of gestures in Arabic sign language using neuro-fuzzy systems. Artif. Intell. 133(1), 117–138 (2001)

    Article  MATH  Google Scholar 

  • G. Awad, J. Han, A. Sutherland, A unified system for segmentation and tracking of face and hands in sign language recognition, in Proceedings of the 18th International Conference on Pattern Recognition, vol. 1, 2006, pp. 239–242

    Google Scholar 

  • M. Baklouti, E. Monacelli, V. Guitteny, S. Couvet, Intelligent assistive exoskeleton with vision based interface, in Proceedings of the 5th International Conference On Smart Homes and Health Telematics, 2008, pp. 123–135

    Google Scholar 

  • B. Bauer, K.-F. Kraiss, Video-based sign recognition using self-organizing subunits, in Proceedings of the 16th International Conference on Pattern Recognition, vol. 2, 2002, pp. 434–437

    Google Scholar 

  • S. Belongie, J. Malik, J. Puzicha, Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002)

    Article  Google Scholar 

  • Q. Cai, D. Gallup, C. Zhang, Z. Zhang, 3D deformable face tracking with a commodity depth camera, in Proceedings of the 11th European Conference on Computer Vision, 2010, pp. 229–242

    Google Scholar 

  • X. Chen, M. Koskela, Online RGB-D gesture recognition with extreme learning machines, in Proceedings of the 15th ACM International Conference on Multimodal Interaction, 2013, pp. 467–474

    Google Scholar 

  • H. Cooper, B. Holt, R. Bowden, Sign language recognition, in Visual Analysis of Humans, 2011, pp. 539–562

    Google Scholar 

  • H. Cooper, E.-J. Ong, N. Pugeault, R. Bowden, Sign language recognition using sub-units. J. Mach. Learn. Res. 13, 2205–2231 (2012)

    MATH  Google Scholar 

  • A. Corradini, Real-time gesture recognition by means of hybrid recognizers, in Proceedings of International Gesture Workshop on Gesture and Sign Languages in Human-Computer Interaction, 2002, pp. 34–47

    Google Scholar 

  • R. Cutler, M. Turk, View-based interpretation of real-time optical flow for gesture recognition, in Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, 1998, pp. 416–416

    Google Scholar 

  • N. Dardas, Real-time hand gesture detection and recognition for human computer interaction, Ph.D. thesis, University of Ottawa, 2012

    Google Scholar 

  • P. Doliotis, A. Stefan, C. Mcmurrough, D. Eckhard, V. Athitsos, Comparing gesture recognition accuracy using color and depth information, in Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments, 2011, p. 20

    Google Scholar 

  • J. Edmonds, Maximum matching and a polyhedron with 0, 1-vertices. J. Res. Natl. Bur. Stand. B 69, 125–130 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  • H. Ershaed, I. Al-Alali, N. Khasawneh, M. Fraiwan, An Arabic sign language computer interface using the Xbox Kinect, in Proceedings of the Annual Undergraduate Research Conference on Applied Computing, vol. 1, 2011

    Google Scholar 

  • H. Escalante, I. Guyon, Principal Motion, 2012, http://www.causality.inf.ethz.ch/Gesture/principal_motion.pdf

  • H.J. Escalante, I. Guyon, V. Athitsos, P. Jangyodsuk, J. Wan, Principal motion components for gesture recognition using a single-example, 2013, arXiv:1310.4822

  • S.R. Fanello, I. Gori, G. Metta, F. Odone, One-shot learning for real-time action recognition, in Pattern Recognition and Image Analysis, 2013, pp. 31–40

    Google Scholar 

  • G. Fang, W. Gao, D. Zhao, Large vocabulary sign language recognition based on fuzzy decision trees. IEEE Trans. Syst. Man Cybern. A 34(3), 305–314 (2004)

    Article  Google Scholar 

  • A. Fornés, S. Escalera, J. Lladós, E. Valveny, Symbol classification using dynamic aligned shape descriptor, in Proceedings of the 20th International Conference on Pattern Recognition, 2010, pp. 1957–1960

    Google Scholar 

  • I. Guyon, V. Athitsos, P. Jangyodsuk, H.J. Escalante, B. Hamner, Results and analysis of the Chalearn gesture challenge 2012, in Proceedings of International Workshop on Advances in Depth Image Analysis and Applications, 2013, pp. 186–204

    Google Scholar 

  • I. Guyon, V. Athitsos, P. Jangyodsuk, H.J. Escalante, The chalearn gesture dataset (CGD 2011). Mach. Vis. Appl. 25(8), 1929–1951 (2014). doi:10.1007/s00138-014-0596-3

    Article  Google Scholar 

  • C.-L. Huang, W.-Y. Huang, Sign language recognition using model-based tracking and a 3D Hopfield neural network. Mach. Vis. Appl. 10(5–6), 292–307 (1998)

    Article  Google Scholar 

  • G.-B. Huang, H. Zhou, X. Ding, R. Zhang, Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. B 42(2), 513–529 (2012)

    Article  Google Scholar 

  • V.I Levenshtein, Binary codes capable of correcting deletions, insertions and reversals, in Soviet Physics Doklady, vol. 10, 1966, p. 707

    Google Scholar 

  • Y.-S. Jeong, M.K. Jeong, O.A. Omitaomu, Weighted dynamic time warping for time series classification. Pattern Recognit. 44(9), 2231–2240 (2011)

    Article  Google Scholar 

  • T. Kadir, R. Bowden, E.J. Ong, A. Zisserman, Minimal training, large lexicon, unconstrained sign language recognition, in Proceedings of the British Machine Vision Conference, vol. 1, 2004, pp. 1–10

    Google Scholar 

  • H.W. Kuhn, The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2(1–2), 83–97 (1955)

    Article  MathSciNet  MATH  Google Scholar 

  • J.F. Lichtenauer, E.A. Hendriks, M.J.T. Reinders, Sign language recognition by combining statistical DTW and independent classification. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 2040–2046 (2008)

    Article  Google Scholar 

  • S.K. Liddell, R.E. Johnson, American sign language. Sign Lang. Stud. 64, 195–278 (1989)

    Article  Google Scholar 

  • Y.M. Lui, Human gesture recognition on product manifolds. J. Mach. Learn. Res. 13(1), 3297–3321 (2012a)

    MathSciNet  MATH  Google Scholar 

  • Y.M. Lui, A least squares regression framework on manifolds and its application to gesture recognition, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012b, pp. 13–18

    Google Scholar 

  • U. Mahbub, T. Roy, M.S. Rahman, H. Imtiaz, One-shot-learning gesture recognition using motion history based gesture silhouettes, in Proceedings of the International Conference on Industrial Application Engineering, 2013, pp. 186–193

    Google Scholar 

  • M.R. Malgireddy, I. Inwogu, V. Govindaraju, A temporal Bayesian model for classifying, detecting and localizing activities in video sequences, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 43–48

    Google Scholar 

  • M. Maraqa, R. Abu-Zaiter, Recognition of Arabic Sign Language (ArSL) using recurrent neural networks, in Proceedings of the First International Conference on the Applications of Digital Information and Web Technologies, 2008, pp. 478–481

    Google Scholar 

  • T.H.H. Maung, Real-time hand tracking and gesture recognition system using neural networks. World Acad. Sci. Eng. Technol. 50, 466–470 (2009)

    Google Scholar 

  • S. Mitra, T. Acharya, Gesture recognition: a survey. IEEE Trans. Syst. Man Cybern. C 37(3), 311–324 (2007)

    Article  Google Scholar 

  • S. Mu-Chun, A fuzzy rule-based approach to spatio-temporal hand gesture recognition. IEEE Trans. Syst. Man Cybern. C 30(2), 276–281 (2000)

    Article  Google Scholar 

  • K. Nickel, R. Stiefelhagen, Visual recognition of pointing gestures for human-robot interaction. Image Vis. Comput. 25(12), 1875–1884 (2007)

    Article  Google Scholar 

  • I. Oikonomidis, N. Kyriazis, A. Argyros, Efficient model-based 3D tracking of hand articulations using Kinect, in Proceedings of the British Machine Vision Conference, 2011, pp. 1–11

    Google Scholar 

  • E.-J. Ong, R. Bowden, A boosted classifier tree for hand shape detection, in Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004, pp. 889–894

    Google Scholar 

  • A. Ramamoorthy, N. Vaswani, S. Chaudhury, S. Banerjee, Recognition of dynamic hand gestures. Pattern Recognit. 36(9), 2069–2081 (2003)

    Article  MATH  Google Scholar 

  • I. Rauschert, P. Agrawal, R. Sharma, S. Fuhrmann, I. Brewer, A. MacEachren, Designing a human-centered, multimodal GIS interface to support emergency management, in Proceedings of the 10th ACM International Symposium on Advances in Geographic Information Systems, 2002, pp. 119–124

    Google Scholar 

  • Z. Ren, J. Yuan, J. Meng, Z. Zhang, Robust part-based hand gesture recognition using Kinect sensor. IEEE Trans. Multimed. 15(5), 1110–1120 (2013)

    Article  Google Scholar 

  • M. Reyes, G. Dominguez, S. Escalera, Feature weighting in dynamic time warping for gesture recognition in depth data, in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2011, pp. 1182–1188

    Google Scholar 

  • Y. Sabinas, E.F. Morales, H.J. Escalante, A one-shot DTW-based method for early gesture recognition, in Proceedings of 18th Iberoamerican Congress on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, 2013, pp. 439–446

    Google Scholar 

  • H.J. Seo, P. Milanfar, Action recognition from one example. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 867–882 (2011)

    Article  Google Scholar 

  • L. Shao, L. Ji, Motion histogram analysis based key frame extraction for human action/activity representation, in Proceedings of Canadian Conference on Computer and Robot Vision, 2009, pp. 88–92

    Google Scholar 

  • J. Shotton, A.W. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, A. Blake, Real-time human pose recognition in parts from single depth images, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 1297–1304

    Google Scholar 

  • E. Stergiopoulou, N. Papamarkos, Hand gesture recognition using a neural network shape fitting technique. Eng. Appl. Artif. Intell. 22(8), 1141–1158 (2009)

    Article  Google Scholar 

  • W.C. Stokoe, Sign language structure: an outline of the visual communication systems of the American deaf. Studies in Linguistics, Occasional Papers, 8, 1960

    Google Scholar 

  • C.P. Vogler, American Sign Language recognition: reducing the complexity of the task with phoneme-based modeling and parallel hidden Markov models, Ph.D. thesis, University of Pennsylvania, 2003

    Google Scholar 

  • C. Vogler, D. Metaxas, Parallel hidden Markov models for American Sign Language recognition, in Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 1 1999, pp. 116–122

    Google Scholar 

  • J. Wachs, M. Kolsch, H. Stem, Y. Edan, Vision-based hand-gesture applications. Commun. ACM 54(2), 60–71 (2011)

    Article  Google Scholar 

  • J. Wan, Q. Ruan, G. An, W. Li, Gesture recognition based on hidden Markov model from sparse representative observations, in Proceedings of the IEEE 11th International Conference on Signal Processing, vol. 2 2012a, pp. 1180–1183

    Google Scholar 

  • J. Wan, Q. Ruan, G. An, W. Li, Hand tracking and segmentation via graph cuts and dynamic model in sign language videos, in Proceedings of IEEE 11th International Conference on Signal Processing, vol. 2 (IEEE, Piscataway, 2012b), pp. 1135–1138

    Google Scholar 

  • J. Wan, Q. Ruan, W. Li, S. Deng, One-shot learning gesture recognition from RGB-D data using bag of features. J. Mach. Learn. Res. 14(1), 2549–2582 (2013)

    Google Scholar 

  • J. Wan, V. Athitsos, P. Jangyodsuk, H.J. Escalante, Q. Ruan, I. Guyon, CSMMI: class-specific maximization of mutual information for action and gesture recognition. IEEE Trans. Image Process. 23(7), 3152–3165 (2014a)

    Article  MathSciNet  Google Scholar 

  • J. Wan, Q. Ruan, W. Li, G. An, R. Zhao, 3D SMoSIFT: three-dimensional sparse motion scale invariant feature transform for activity recognition from RGB-D videos. J. Electron. Imaging 23(2), 023017 (2014b)

    Article  Google Scholar 

  • C. Wang, W. Gao, S. Shan, An approach based on phonemes to large vocabulary Chinese sign language recognition, in Proceedings of the IEEE Conference on Automatic Face and Gesture Recognition, 2002, pp. 411–416

    Google Scholar 

  • J. Wang, Z. Liu, Y. Wu, J. Yuan, Mining actionlet ensemble for action recognition with depth cameras, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 1290–1297

    Google Scholar 

  • S.-F. Wong, T.-K. Kim, R. Cipolla, Learning motion categories using both semantic and structural information, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–6

    Google Scholar 

  • D. Wu, F. Zhu, L. Shao, One shot learning gesture recognition from RGBD images, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012a, pp. 7–12

    Google Scholar 

  • S. Wu, F. Jiang, D. Zhao, S. Liu, W. Gao, Viewpoint-independent hand gesture recognition system, in Proceedings of the IEEE Conference on Visual Communications and Image Processing, 2012b, pp. 43–48

    Google Scholar 

  • M. Zahedi, D. Keysers, H. Ney, Appearance-based recognition of words in american sign language, in Proceedings of Second Iberian Conference on Pattern recognition and image analysis, 2005, pp. 511–519

    Google Scholar 

  • L.-G. Zhang, Y. Chen, G. Fang, X. Chen, W. Gao, A vision-based sign language recognition system using tied-mixture density HMM, in Proceedings of the 6th International Conference on Multimodal Interfaces, 2004, pp. 198–204

    Google Scholar 

Download references

Acknowledgements

We would like to acknowledge the editors and reviewers, whose valuable comments greatly improved the manuscript. Specially, we would also like to thank Escalante and Guyon who kindly provided us the principal motion source code and Microsoft Asian who kindly provided two sets of Kinect devices. This work was supported in part by the Major State Basic Research Development Program of China (973 Program 2015CB351804) and the National Natural Science Foundation of China under Grant No. 61272386, 61100096 and 61300111.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feng Jiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Jiang, F., Zhang, S., Wu, S., Gao, Y., Zhao, D. (2017). Multi-layered Gesture Recognition with Kinect. In: Escalera, S., Guyon, I., Athitsos, V. (eds) Gesture Recognition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-57021-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57021-1_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57020-4

  • Online ISBN: 978-3-319-57021-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics