Abstract
Hand gesture recognition has many practical applications including human-computer interfaces. Many depth-based features for dynamic hand gesture recognition task have been proposed. However the performance is still unsatisfactory due to the limitation that these features can’t efficiently capture both effective shape information and detailed variation of hands in spatial and temporal domains. In this paper, we propose a new effective descriptor, DLEH2, for depth-based dynamic hand gesture recognition which is developed based on the characteristics of dynamic hand gesture through fusing simple shape and spatio-temporal features of depth sequences. For shape information, depth motion maps (DMMs) are first employed to obtain 3D structure and shape information of hands. To enhance critical shape cues, the local texture and edge information of three DMMs for hand gesture sequences are captured using DLE descriptor. However, DMMs compress the temporal information of the depth sequences into space domain, which loses critical discrimination for temporal sequence recognition to some degree. Simple but effective spatio-temporal features, HOG2, are concatenated with DLE to compensate the temporal information loss during DMMs generation and capture the detailed spatial and temporal variation of hands. Experimental results on two public benchmark datasets, 99.10 % for MSRGesture3D dataset and 98.43 % for SKIG dataset, show that the proposed fusion scheme outperforms the state-of-the-art methods.
Similar content being viewed by others
References
Bandera J, Marfil R, Bandera A, Rodríguez JA, Molina-Tanco L, Sandoval F (2009) Fast gesture recognition based on a two-level representation. Pattern Recogn Lett 30(13):1181–1189
Bulbul MF, Jiang Y, Ma J (2015) Real-time human action recognition using DMMs-based LBP and EOH features. Intell Comput Theor Method. Springer: 271–282
Bulbul MF, Jiang Y, Ma J (2015) DMMs-based multiple features fusion for human action recognition. Int J Multimed Data Eng Manag (IJMDEM) 6(4):23–39
Chen C, Jafari R, Kehtarnavaz N (2015) Action recognition from depth sequences using depth motion maps-based local binary patterns. Appl Comput Vision (WACV), IEEE Winter Conf. IEEE: 1092–1099
Chen C, Liu K, Kehtarnavaz N (2013) Real-time human action recognition based on depth motion maps. J Real-Time Imag Process: 1–9
Chen C, Zhang B, Hou Z, Jiang J, Liu M, Yang Y (2016) Action recognition from depth sequences using weighted fusion of 2D and 3D auto-correlation of gradients features. Multimed Tools Appl: 1–19
Choi H, Park H (2014) A hierarchical structure for gesture recognition using RGB-D sensor. Proc Second Int Conf Human-Agent Interact. ACM; 265–268
Cirujeda P, Binefa X (2014) 4DCov: a nested covariance descriptor of spatio-temporal features for gesture recognition in depth sequences. 3D Vision (3DV), 2014 2nd Int Conf. IEEE: 657–664
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. 2005 I.E. Comput Soc Conf Comput Vision Pattern Recognit (CVPR’05). IEEE: 886–893
Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874
Hong P, Turk M, Huang TS (2000) Gesture modeling and recognition using finite state machines. Automatic face and gesture recognition, 2000. Proc Fourth IEEE Int Conf. IEEE: 410–415
Kurakin A, Zhang Z, Liu Z (2012) A real time system for dynamic hand gesture recognition with a depth sensor. Sign Process Conf (EUSIPCO), 2012 Proc 20th Europ. IEEE: 1975–1979
Li W, Chen C, Su H, Du Q (2015) Local binary patterns for spatial-spectral classification of hyperspectral imagery. IEEE Trans Geosci Remote Sens 53(7):3681–3693
Liang B, Zheng L (2015) Spatio-temporal pyramid cuboid matching for action recognition using depth maps. Image Process (ICIP), 2015 I.E. Int Conf. IEEE: 2070–2074
Liu M, Liu H (2016) Depth context: a new descriptor for human activity recognition by using sole depth sequences. Neurocomputing 175:747–758
Liu L, Shao L (2013) Learning discriminative representations from RGB-D video data. IJCAI
Madany E, El Din N, He Y, Guan L (2015) Human action recognition using temporal hierarchical pyramid of depth motion map and KECA. Multimed Sign Process (MMSP), 2015 I.E. 17th Int Workshop. IEEE: 1–6
Nishida N, Nakayama H (2015) Multimodal gesture recognition using multi-stream recurrent neural network. Imag Video Technol. Springer: 682–694
Ohn-Bar E, Trivedi M (2013) Joint angles similarities and HOG2 for action recognition. Proc IEEE Conf Comput Vision Pattern Recognition Workshops: 465–470
Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Anal Mach Intell, IEEE Trans 24(7):971–987
Oreifej O, Liu Z (2013) Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. Proc IEEE Conf Comput Vision Pattern Recognit: 716–723
Rahmani H, Mahmood A, Huynh DQ, Mian A (2014) Real time action recognition using histograms of depth gradients and random decision forests. Appl Comput Vision (WACV), 2014 I.E. Winter Conf. IEEE: 626–633
Ramamoorthy A, Vaswani N, Chaudhury S, Banerjee S (2003) Recognition of dynamic hand gestures. Pattern Recogn 36(9):2069–2081
Santos DG, Fernandes BJ, Bezerra BL (2015) HAGR-D: a novel approach for gesture recognition with depth maps. Sensors 15(11):28646–28664
Shen X, Hua G, Williams L, Wu Y (2012) Dynamic hand gesture recognition: an exemplar-based approach from motion divergence fields. Image Vis Comput 30(3):227–235
Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook M, Moore R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124
Suk H-I, Sin B-K, Lee S-W (2008) Recognizing hand gestures using dynamic bayesian network. Automatic Face Gesture Recognit, 2008. FG’08. 8th IEEE Int Conf. IEEE: 1–6
Tran QD, Ly NQ (2013) Sparse spatio-temporal representation of joint shape-motion cues for human action recognition in depth sequences. Comput Commun Technol, Res, Innova Vision Future (RIVF), 2013 I.E. RIVF Int Conf, 2013. IEEE: 253–258
Tung PT, Ngoc LQ. Elliptical density shape model for hand gesture recognition. Proc Fifth Symp Inform Commun Technol. ACM: 186–191
Wang J, Liu Z, Chorowski J, Chen Z, Wu Y (2012) Robust 3d action recognition with random occupancy patterns. Comput Vision–ECCV 2012. Springer: 872–885
Wang SB, Quattoni A, Morency L-P, Demirdjian D, Darrell T (2006) Hidden conditional random fields for gesture recognition. 2006 I.E. Comput Soc Conf Comput Vision Pattern Recognition (CVPR’06). IEEE: 1521–1527
Wang X, Xia M, Cai H, Gao Y, Cattani C (2012) Hidden-markov-models-based dynamic hand gesture recognition. Math Problems Eng 2012
Yang X, Tian Y (2014) Super normal vector for activity recognition using depth sequences. Proc IEEE Conf Comput Vision Pattern Recognit: 804–811
Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. Proc 20th ACM Int Conf Multimed. ACM: 1057–1060
Zhang L, Gao Y, Xia Y, Dai Q, Li X (2015) A fine-grained image categorization system by cellet-encoded spatial pyramid modeling. IEEE Trans Ind Electron 62(1):564–571
Zhang L, Gao Y, Xia Y, Lu K, Shen J, Ji R (2014) Representative discovery of structure cues for weakly-supervised image segmentation. IEEE Trans Multimed 16(2):470–479
Zhang L, Han Y, Yang Y, Song M, Yan S, Tian Q (2013) Discovering discriminative graphlets for aerial image categories recognition. IEEE Trans Image Process 22(12):5071–5084
Zhang C, Tian Y (2013) Edge enhanced depth motion map for dynamic hand gesture recognition. Proc IEEE Conf Comput Vision Pattern Recognition Workshops: 500–505
Zhang L, Yang Y, Gao Y, Yu Y, Wang C, Li X (2014) A probabilistic associative model for segmenting weakly supervised images. IEEE Trans Image Process 23(9):4150–4159
Zhu Y, Chen W, Guo G (2015) Fusing multiple features for depth-based action recognition. ACM Trans Intell Syst Technol (TIST) 6(2):18
Acknowledgments
This work was partially supported by the National Natural Science Foundation of China (no. 61304262).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zheng, J., Feng, Z., Xu, C. et al. Fusing shape and spatio-temporal features for depth-based dynamic hand gesture recognition. Multimed Tools Appl 76, 20525–20544 (2017). https://doi.org/10.1007/s11042-016-3988-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3988-8