Multi-layered Gesture Recognition with Kinect

Jiang, Feng; Zhang, Shengping; Wu, Shen; Gao, Yang; Zhao, Debin

doi:10.1007/978-3-319-57021-1_13

Multi-layered Gesture Recognition with Kinect

Feng Jiang⁷,
Shengping Zhang⁷,
Shen Wu⁷,
Yang Gao⁷ &
…
Debin Zhao⁷

Chapter
First Online: 20 July 2017

2257 Accesses
8 Citations

Part of the book series: The Springer Series on Challenges in Machine Learning ((SSCML))

Abstract

This paper proposes a novel multi-layered gesture recognition method with Kinect. We explore the essential linguistic characters of gestures: the components concurrent character and the sequential organization character, in a multi-layered framework, which extracts features from both the segmented semantic units and the whole gesture sequence and then sequentially classifies the motion, location and shape components. In the first layer, an improved principle motion is applied to model the motion component. In the second layer, a particle-based descriptor and a weighted dynamic time warping are proposed for the location component classification. In the last layer, the spatial path warping is further proposed to classify the shape component represented by unclosed shape context. The proposed method can obtain relatively high performance for one-shot learning gesture recognition on the ChaLearn Gesture Dataset comprising more than 50,000 gesture sequences recorded with Kinect.

Editors: Sergio Escalera, Isabelle Guyon and Vassilis Athitsos

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
In this paper, we use the term “gesture sequence” to mean an image sequence that contains only one complete gesture and “multi-gesture sequence” to mean an image sequence which may contain one or multiple gesture sequences.
2.
Available at http://gesture.chalearn.org/data/sample-code.

References

T. Agrawal, S. Chaudhuri, Gesture recognition using motion histogram, in Proceedings of the Indian National Conference of Communications, 2003, pp. 438–442
Google Scholar
O. Al-Jarrah, A. Halawani, Recognition of gestures in Arabic sign language using neuro-fuzzy systems. Artif. Intell. 133(1), 117–138 (2001)
Article MATH Google Scholar
G. Awad, J. Han, A. Sutherland, A unified system for segmentation and tracking of face and hands in sign language recognition, in Proceedings of the 18th International Conference on Pattern Recognition, vol. 1, 2006, pp. 239–242
Google Scholar
M. Baklouti, E. Monacelli, V. Guitteny, S. Couvet, Intelligent assistive exoskeleton with vision based interface, in Proceedings of the 5th International Conference On Smart Homes and Health Telematics, 2008, pp. 123–135
Google Scholar
B. Bauer, K.-F. Kraiss, Video-based sign recognition using self-organizing subunits, in Proceedings of the 16th International Conference on Pattern Recognition, vol. 2, 2002, pp. 434–437
Google Scholar
S. Belongie, J. Malik, J. Puzicha, Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002)
Article Google Scholar
Q. Cai, D. Gallup, C. Zhang, Z. Zhang, 3D deformable face tracking with a commodity depth camera, in Proceedings of the 11th European Conference on Computer Vision, 2010, pp. 229–242
Google Scholar
X. Chen, M. Koskela, Online RGB-D gesture recognition with extreme learning machines, in Proceedings of the 15th ACM International Conference on Multimodal Interaction, 2013, pp. 467–474
Google Scholar
H. Cooper, B. Holt, R. Bowden, Sign language recognition, in Visual Analysis of Humans, 2011, pp. 539–562
Google Scholar
H. Cooper, E.-J. Ong, N. Pugeault, R. Bowden, Sign language recognition using sub-units. J. Mach. Learn. Res. 13, 2205–2231 (2012)
MATH Google Scholar
A. Corradini, Real-time gesture recognition by means of hybrid recognizers, in Proceedings of International Gesture Workshop on Gesture and Sign Languages in Human-Computer Interaction, 2002, pp. 34–47
Google Scholar
R. Cutler, M. Turk, View-based interpretation of real-time optical flow for gesture recognition, in Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, 1998, pp. 416–416
Google Scholar
N. Dardas, Real-time hand gesture detection and recognition for human computer interaction, Ph.D. thesis, University of Ottawa, 2012
Google Scholar
P. Doliotis, A. Stefan, C. Mcmurrough, D. Eckhard, V. Athitsos, Comparing gesture recognition accuracy using color and depth information, in Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments, 2011, p. 20
Google Scholar
J. Edmonds, Maximum matching and a polyhedron with 0, 1-vertices. J. Res. Natl. Bur. Stand. B 69, 125–130 (1965)
Article MathSciNet MATH Google Scholar
H. Ershaed, I. Al-Alali, N. Khasawneh, M. Fraiwan, An Arabic sign language computer interface using the Xbox Kinect, in Proceedings of the Annual Undergraduate Research Conference on Applied Computing, vol. 1, 2011
Google Scholar
H. Escalante, I. Guyon, Principal Motion, 2012, http://www.causality.inf.ethz.ch/Gesture/principal_motion.pdf
H.J. Escalante, I. Guyon, V. Athitsos, P. Jangyodsuk, J. Wan, Principal motion components for gesture recognition using a single-example, 2013, arXiv:1310.4822
S.R. Fanello, I. Gori, G. Metta, F. Odone, One-shot learning for real-time action recognition, in Pattern Recognition and Image Analysis, 2013, pp. 31–40
Google Scholar
G. Fang, W. Gao, D. Zhao, Large vocabulary sign language recognition based on fuzzy decision trees. IEEE Trans. Syst. Man Cybern. A 34(3), 305–314 (2004)
Article Google Scholar
A. Fornés, S. Escalera, J. Lladós, E. Valveny, Symbol classification using dynamic aligned shape descriptor, in Proceedings of the 20th International Conference on Pattern Recognition, 2010, pp. 1957–1960
Google Scholar
I. Guyon, V. Athitsos, P. Jangyodsuk, H.J. Escalante, B. Hamner, Results and analysis of the Chalearn gesture challenge 2012, in Proceedings of International Workshop on Advances in Depth Image Analysis and Applications, 2013, pp. 186–204
Google Scholar
I. Guyon, V. Athitsos, P. Jangyodsuk, H.J. Escalante, The chalearn gesture dataset (CGD 2011). Mach. Vis. Appl. 25(8), 1929–1951 (2014). doi:10.1007/s00138-014-0596-3
Article Google Scholar
C.-L. Huang, W.-Y. Huang, Sign language recognition using model-based tracking and a 3D Hopfield neural network. Mach. Vis. Appl. 10(5–6), 292–307 (1998)
Article Google Scholar
G.-B. Huang, H. Zhou, X. Ding, R. Zhang, Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. B 42(2), 513–529 (2012)
Article Google Scholar
V.I Levenshtein, Binary codes capable of correcting deletions, insertions and reversals, in Soviet Physics Doklady, vol. 10, 1966, p. 707
Google Scholar
Y.-S. Jeong, M.K. Jeong, O.A. Omitaomu, Weighted dynamic time warping for time series classification. Pattern Recognit. 44(9), 2231–2240 (2011)
Article Google Scholar
T. Kadir, R. Bowden, E.J. Ong, A. Zisserman, Minimal training, large lexicon, unconstrained sign language recognition, in Proceedings of the British Machine Vision Conference, vol. 1, 2004, pp. 1–10
Google Scholar
H.W. Kuhn, The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2(1–2), 83–97 (1955)
Article MathSciNet MATH Google Scholar
J.F. Lichtenauer, E.A. Hendriks, M.J.T. Reinders, Sign language recognition by combining statistical DTW and independent classification. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 2040–2046 (2008)
Article Google Scholar
S.K. Liddell, R.E. Johnson, American sign language. Sign Lang. Stud. 64, 195–278 (1989)
Article Google Scholar
Y.M. Lui, Human gesture recognition on product manifolds. J. Mach. Learn. Res. 13(1), 3297–3321 (2012a)
MathSciNet MATH Google Scholar
Y.M. Lui, A least squares regression framework on manifolds and its application to gesture recognition, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012b, pp. 13–18
Google Scholar
U. Mahbub, T. Roy, M.S. Rahman, H. Imtiaz, One-shot-learning gesture recognition using motion history based gesture silhouettes, in Proceedings of the International Conference on Industrial Application Engineering, 2013, pp. 186–193
Google Scholar
M.R. Malgireddy, I. Inwogu, V. Govindaraju, A temporal Bayesian model for classifying, detecting and localizing activities in video sequences, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 43–48
Google Scholar
M. Maraqa, R. Abu-Zaiter, Recognition of Arabic Sign Language (ArSL) using recurrent neural networks, in Proceedings of the First International Conference on the Applications of Digital Information and Web Technologies, 2008, pp. 478–481
Google Scholar
T.H.H. Maung, Real-time hand tracking and gesture recognition system using neural networks. World Acad. Sci. Eng. Technol. 50, 466–470 (2009)
Google Scholar
S. Mitra, T. Acharya, Gesture recognition: a survey. IEEE Trans. Syst. Man Cybern. C 37(3), 311–324 (2007)
Article Google Scholar
S. Mu-Chun, A fuzzy rule-based approach to spatio-temporal hand gesture recognition. IEEE Trans. Syst. Man Cybern. C 30(2), 276–281 (2000)
Article Google Scholar
K. Nickel, R. Stiefelhagen, Visual recognition of pointing gestures for human-robot interaction. Image Vis. Comput. 25(12), 1875–1884 (2007)
Article Google Scholar
I. Oikonomidis, N. Kyriazis, A. Argyros, Efficient model-based 3D tracking of hand articulations using Kinect, in Proceedings of the British Machine Vision Conference, 2011, pp. 1–11
Google Scholar
E.-J. Ong, R. Bowden, A boosted classifier tree for hand shape detection, in Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004, pp. 889–894
Google Scholar
A. Ramamoorthy, N. Vaswani, S. Chaudhury, S. Banerjee, Recognition of dynamic hand gestures. Pattern Recognit. 36(9), 2069–2081 (2003)
Article MATH Google Scholar
I. Rauschert, P. Agrawal, R. Sharma, S. Fuhrmann, I. Brewer, A. MacEachren, Designing a human-centered, multimodal GIS interface to support emergency management, in Proceedings of the 10th ACM International Symposium on Advances in Geographic Information Systems, 2002, pp. 119–124
Google Scholar
Z. Ren, J. Yuan, J. Meng, Z. Zhang, Robust part-based hand gesture recognition using Kinect sensor. IEEE Trans. Multimed. 15(5), 1110–1120 (2013)
Article Google Scholar
M. Reyes, G. Dominguez, S. Escalera, Feature weighting in dynamic time warping for gesture recognition in depth data, in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2011, pp. 1182–1188
Google Scholar
Y. Sabinas, E.F. Morales, H.J. Escalante, A one-shot DTW-based method for early gesture recognition, in Proceedings of 18th Iberoamerican Congress on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, 2013, pp. 439–446
Google Scholar
H.J. Seo, P. Milanfar, Action recognition from one example. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 867–882 (2011)
Article Google Scholar
L. Shao, L. Ji, Motion histogram analysis based key frame extraction for human action/activity representation, in Proceedings of Canadian Conference on Computer and Robot Vision, 2009, pp. 88–92
Google Scholar
J. Shotton, A.W. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, A. Blake, Real-time human pose recognition in parts from single depth images, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 1297–1304
Google Scholar
E. Stergiopoulou, N. Papamarkos, Hand gesture recognition using a neural network shape fitting technique. Eng. Appl. Artif. Intell. 22(8), 1141–1158 (2009)
Article Google Scholar
W.C. Stokoe, Sign language structure: an outline of the visual communication systems of the American deaf. Studies in Linguistics, Occasional Papers, 8, 1960
Google Scholar
C.P. Vogler, American Sign Language recognition: reducing the complexity of the task with phoneme-based modeling and parallel hidden Markov models, Ph.D. thesis, University of Pennsylvania, 2003
Google Scholar
C. Vogler, D. Metaxas, Parallel hidden Markov models for American Sign Language recognition, in Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 1 1999, pp. 116–122
Google Scholar
J. Wachs, M. Kolsch, H. Stem, Y. Edan, Vision-based hand-gesture applications. Commun. ACM 54(2), 60–71 (2011)
Article Google Scholar
J. Wan, Q. Ruan, G. An, W. Li, Gesture recognition based on hidden Markov model from sparse representative observations, in Proceedings of the IEEE 11th International Conference on Signal Processing, vol. 2 2012a, pp. 1180–1183
Google Scholar
J. Wan, Q. Ruan, G. An, W. Li, Hand tracking and segmentation via graph cuts and dynamic model in sign language videos, in Proceedings of IEEE 11th International Conference on Signal Processing, vol. 2 (IEEE, Piscataway, 2012b), pp. 1135–1138
Google Scholar
J. Wan, Q. Ruan, W. Li, S. Deng, One-shot learning gesture recognition from RGB-D data using bag of features. J. Mach. Learn. Res. 14(1), 2549–2582 (2013)
Google Scholar
J. Wan, V. Athitsos, P. Jangyodsuk, H.J. Escalante, Q. Ruan, I. Guyon, CSMMI: class-specific maximization of mutual information for action and gesture recognition. IEEE Trans. Image Process. 23(7), 3152–3165 (2014a)
Article MathSciNet Google Scholar
J. Wan, Q. Ruan, W. Li, G. An, R. Zhao, 3D SMoSIFT: three-dimensional sparse motion scale invariant feature transform for activity recognition from RGB-D videos. J. Electron. Imaging 23(2), 023017 (2014b)
Article Google Scholar
C. Wang, W. Gao, S. Shan, An approach based on phonemes to large vocabulary Chinese sign language recognition, in Proceedings of the IEEE Conference on Automatic Face and Gesture Recognition, 2002, pp. 411–416
Google Scholar
J. Wang, Z. Liu, Y. Wu, J. Yuan, Mining actionlet ensemble for action recognition with depth cameras, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 1290–1297
Google Scholar
S.-F. Wong, T.-K. Kim, R. Cipolla, Learning motion categories using both semantic and structural information, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–6
Google Scholar
D. Wu, F. Zhu, L. Shao, One shot learning gesture recognition from RGBD images, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012a, pp. 7–12
Google Scholar
S. Wu, F. Jiang, D. Zhao, S. Liu, W. Gao, Viewpoint-independent hand gesture recognition system, in Proceedings of the IEEE Conference on Visual Communications and Image Processing, 2012b, pp. 43–48
Google Scholar
M. Zahedi, D. Keysers, H. Ney, Appearance-based recognition of words in american sign language, in Proceedings of Second Iberian Conference on Pattern recognition and image analysis, 2005, pp. 511–519
Google Scholar
L.-G. Zhang, Y. Chen, G. Fang, X. Chen, W. Gao, A vision-based sign language recognition system using tied-mixture density HMM, in Proceedings of the 6th International Conference on Multimodal Interfaces, 2004, pp. 198–204
Google Scholar

Download references

Acknowledgements

We would like to acknowledge the editors and reviewers, whose valuable comments greatly improved the manuscript. Specially, we would also like to thank Escalante and Guyon who kindly provided us the principal motion source code and Microsoft Asian who kindly provided two sets of Kinect devices. This work was supported in part by the Major State Basic Research Development Program of China (973 Program 2015CB351804) and the National Natural Science Foundation of China under Grant No. 61272386, 61100096 and 61300111.

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Feng Jiang, Shengping Zhang, Shen Wu, Yang Gao & Debin Zhao

Authors

Feng Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Shengping Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shen Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Debin Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feng Jiang .

Editor information

Editors and Affiliations

University of Barcelona, Barcelona, Spain
Sergio Escalera
ChaLearn, Berkeley, California, USA
Isabelle Guyon
Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA
Vassilis Athitsos

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Jiang, F., Zhang, S., Wu, S., Gao, Y., Zhao, D. (2017). Multi-layered Gesture Recognition with Kinect. In: Escalera, S., Guyon, I., Athitsos, V. (eds) Gesture Recognition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-57021-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-57021-1_13
Published: 20 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57020-4
Online ISBN: 978-3-319-57021-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics