Skip to main content
Log in

Fusing shape and spatio-temporal features for depth-based dynamic hand gesture recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Hand gesture recognition has many practical applications including human-computer interfaces. Many depth-based features for dynamic hand gesture recognition task have been proposed. However the performance is still unsatisfactory due to the limitation that these features can’t efficiently capture both effective shape information and detailed variation of hands in spatial and temporal domains. In this paper, we propose a new effective descriptor, DLEH2, for depth-based dynamic hand gesture recognition which is developed based on the characteristics of dynamic hand gesture through fusing simple shape and spatio-temporal features of depth sequences. For shape information, depth motion maps (DMMs) are first employed to obtain 3D structure and shape information of hands. To enhance critical shape cues, the local texture and edge information of three DMMs for hand gesture sequences are captured using DLE descriptor. However, DMMs compress the temporal information of the depth sequences into space domain, which loses critical discrimination for temporal sequence recognition to some degree. Simple but effective spatio-temporal features, HOG2, are concatenated with DLE to compensate the temporal information loss during DMMs generation and capture the detailed spatial and temporal variation of hands. Experimental results on two public benchmark datasets, 99.10 % for MSRGesture3D dataset and 98.43 % for SKIG dataset, show that the proposed fusion scheme outperforms the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Bandera J, Marfil R, Bandera A, Rodríguez JA, Molina-Tanco L, Sandoval F (2009) Fast gesture recognition based on a two-level representation. Pattern Recogn Lett 30(13):1181–1189

    Article  MATH  Google Scholar 

  2. Bulbul MF, Jiang Y, Ma J (2015) Real-time human action recognition using DMMs-based LBP and EOH features. Intell Comput Theor Method. Springer: 271–282

  3. Bulbul MF, Jiang Y, Ma J (2015) DMMs-based multiple features fusion for human action recognition. Int J Multimed Data Eng Manag (IJMDEM) 6(4):23–39

    Article  Google Scholar 

  4. Chen C, Jafari R, Kehtarnavaz N (2015) Action recognition from depth sequences using depth motion maps-based local binary patterns. Appl Comput Vision (WACV), IEEE Winter Conf. IEEE: 1092–1099

  5. Chen C, Liu K, Kehtarnavaz N (2013) Real-time human action recognition based on depth motion maps. J Real-Time Imag Process: 1–9

  6. Chen C, Zhang B, Hou Z, Jiang J, Liu M, Yang Y (2016) Action recognition from depth sequences using weighted fusion of 2D and 3D auto-correlation of gradients features. Multimed Tools Appl: 1–19

  7. Choi H, Park H (2014) A hierarchical structure for gesture recognition using RGB-D sensor. Proc Second Int Conf Human-Agent Interact. ACM; 265–268

  8. Cirujeda P, Binefa X (2014) 4DCov: a nested covariance descriptor of spatio-temporal features for gesture recognition in depth sequences. 3D Vision (3DV), 2014 2nd Int Conf. IEEE: 657–664

  9. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. 2005 I.E. Comput Soc Conf Comput Vision Pattern Recognit (CVPR’05). IEEE: 886–893

  10. Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874

    MATH  Google Scholar 

  11. Hong P, Turk M, Huang TS (2000) Gesture modeling and recognition using finite state machines. Automatic face and gesture recognition, 2000. Proc Fourth IEEE Int Conf. IEEE: 410–415

  12. Kurakin A, Zhang Z, Liu Z (2012) A real time system for dynamic hand gesture recognition with a depth sensor. Sign Process Conf (EUSIPCO), 2012 Proc 20th Europ. IEEE: 1975–1979

  13. Li W, Chen C, Su H, Du Q (2015) Local binary patterns for spatial-spectral classification of hyperspectral imagery. IEEE Trans Geosci Remote Sens 53(7):3681–3693

    Article  Google Scholar 

  14. Liang B, Zheng L (2015) Spatio-temporal pyramid cuboid matching for action recognition using depth maps. Image Process (ICIP), 2015 I.E. Int Conf. IEEE: 2070–2074

  15. Liu M, Liu H (2016) Depth context: a new descriptor for human activity recognition by using sole depth sequences. Neurocomputing 175:747–758

    Article  Google Scholar 

  16. Liu L, Shao L (2013) Learning discriminative representations from RGB-D video data. IJCAI

  17. Madany E, El Din N, He Y, Guan L (2015) Human action recognition using temporal hierarchical pyramid of depth motion map and KECA. Multimed Sign Process (MMSP), 2015 I.E. 17th Int Workshop. IEEE: 1–6

  18. Nishida N, Nakayama H (2015) Multimodal gesture recognition using multi-stream recurrent neural network. Imag Video Technol. Springer: 682–694

  19. Ohn-Bar E, Trivedi M (2013) Joint angles similarities and HOG2 for action recognition. Proc IEEE Conf Comput Vision Pattern Recognition Workshops: 465–470

  20. Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Anal Mach Intell, IEEE Trans 24(7):971–987

    Article  MATH  Google Scholar 

  21. Oreifej O, Liu Z (2013) Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. Proc IEEE Conf Comput Vision Pattern Recognit: 716–723

  22. Rahmani H, Mahmood A, Huynh DQ, Mian A (2014) Real time action recognition using histograms of depth gradients and random decision forests. Appl Comput Vision (WACV), 2014 I.E. Winter Conf. IEEE: 626–633

  23. Ramamoorthy A, Vaswani N, Chaudhury S, Banerjee S (2003) Recognition of dynamic hand gestures. Pattern Recogn 36(9):2069–2081

    Article  MATH  Google Scholar 

  24. Santos DG, Fernandes BJ, Bezerra BL (2015) HAGR-D: a novel approach for gesture recognition with depth maps. Sensors 15(11):28646–28664

    Article  Google Scholar 

  25. Shen X, Hua G, Williams L, Wu Y (2012) Dynamic hand gesture recognition: an exemplar-based approach from motion divergence fields. Image Vis Comput 30(3):227–235

    Article  Google Scholar 

  26. Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook M, Moore R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124

    Article  Google Scholar 

  27. Suk H-I, Sin B-K, Lee S-W (2008) Recognizing hand gestures using dynamic bayesian network. Automatic Face Gesture Recognit, 2008. FG’08. 8th IEEE Int Conf. IEEE: 1–6

  28. Tran QD, Ly NQ (2013) Sparse spatio-temporal representation of joint shape-motion cues for human action recognition in depth sequences. Comput Commun Technol, Res, Innova Vision Future (RIVF), 2013 I.E. RIVF Int Conf, 2013. IEEE: 253–258

  29. Tung PT, Ngoc LQ. Elliptical density shape model for hand gesture recognition. Proc Fifth Symp Inform Commun Technol. ACM: 186–191

  30. Wang J, Liu Z, Chorowski J, Chen Z, Wu Y (2012) Robust 3d action recognition with random occupancy patterns. Comput Vision–ECCV 2012. Springer: 872–885

  31. Wang SB, Quattoni A, Morency L-P, Demirdjian D, Darrell T (2006) Hidden conditional random fields for gesture recognition. 2006 I.E. Comput Soc Conf Comput Vision Pattern Recognition (CVPR’06). IEEE: 1521–1527

  32. Wang X, Xia M, Cai H, Gao Y, Cattani C (2012) Hidden-markov-models-based dynamic hand gesture recognition. Math Problems Eng 2012

  33. Yang X, Tian Y (2014) Super normal vector for activity recognition using depth sequences. Proc IEEE Conf Comput Vision Pattern Recognit: 804–811

  34. Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. Proc 20th ACM Int Conf Multimed. ACM: 1057–1060

  35. Zhang L, Gao Y, Xia Y, Dai Q, Li X (2015) A fine-grained image categorization system by cellet-encoded spatial pyramid modeling. IEEE Trans Ind Electron 62(1):564–571

    Article  Google Scholar 

  36. Zhang L, Gao Y, Xia Y, Lu K, Shen J, Ji R (2014) Representative discovery of structure cues for weakly-supervised image segmentation. IEEE Trans Multimed 16(2):470–479

    Article  Google Scholar 

  37. Zhang L, Han Y, Yang Y, Song M, Yan S, Tian Q (2013) Discovering discriminative graphlets for aerial image categories recognition. IEEE Trans Image Process 22(12):5071–5084

    Article  MathSciNet  Google Scholar 

  38. Zhang C, Tian Y (2013) Edge enhanced depth motion map for dynamic hand gesture recognition. Proc IEEE Conf Comput Vision Pattern Recognition Workshops: 500–505

  39. Zhang L, Yang Y, Gao Y, Yu Y, Wang C, Li X (2014) A probabilistic associative model for segmenting weakly supervised images. IEEE Trans Image Process 23(9):4150–4159

    Article  MathSciNet  Google Scholar 

  40. Zhu Y, Chen W, Guo G (2015) Fusing multiple features for depth-based action recognition. ACM Trans Intell Syst Technol (TIST) 6(2):18

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (no. 61304262).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinqing Zheng.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, J., Feng, Z., Xu, C. et al. Fusing shape and spatio-temporal features for depth-based dynamic hand gesture recognition. Multimed Tools Appl 76, 20525–20544 (2017). https://doi.org/10.1007/s11042-016-3988-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3988-8

Keywords

Navigation