Abstract
The key problems of dynamic hand gesture recognition are large intra-class (gesture types, without considering hand configuration) spatial-temporal variability and similar inter-class (gesture types, only considering hand configuration) motion pattern. Firstly, for intra-class spatial-temporal variability, the key is to reduce the spatial-temporal variability. Due to the average operation can improve the robustness very well, we propose a motion pattern descriptor, Time-Wise Histograms of Oriented Gradients (TWHOG), which extracts the average spatial-temporal information in the space-time domain from three orthogonal projection views (XY, YT, XT). Secondly, for similar inter-class motion pattern, accurate representation of hand configuration is especially important. Therefore, the difference in detail needs to be fully captured, and the shape descriptor can amplify subtle differences. Specifically, we introduce Depth Motion Maps-based Histograms of Oriented Gradients (DMM-HOG) to capture subtle differences in hand configurations between different types of gestures with similar motion patterns. Finally, we concatenate TWHOG and DMM-HOG to form the final feature vector Time-Shape Histograms of Oriented Gradients (TSHOG) and verify the effectiveness of the connection from quantitative and qualitative perspective. Comparison study with the state-of-the-art approaches are conducted on two challenge depth gesture datasets (MSRGesture3D, SKIG). The experiment result shows that TSHOG can achieve satisfactory performance while keeping a relative simple model with lower complexity as well as higher generality.













Similar content being viewed by others
References
Ahmed W, Chanda K, Mitra S (2016) Vision based hand gesture recognition using dynamic time warping for Indian sign language [C]//Information Science (ICIS), International Conference on. IEEE 120–125
Baraldi L, Paci F, Serra G, et al. (2014) Gesture recognition in ego-centric videos using dense trajectories and hand segmentation [C]//IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS. IEEE
Chen C, Jafari R, Kehtarnavaz N (2015) Action recognition from depth sequences using depth motion maps-based local binary patterns [C]//Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on. IEEE 1092–1099
Chen C, Liu K, Kehtarnavaz N (2016) Real-time human action recognition based on depth motion maps [J]. J Real-Time Image Proc 12(1):155–163
Chen C, Zhang B, Hou Z et al (2017) Action recognition from depth sequences using weighted fusion of 2D and 3D auto-correlation of gradients features [J]. Multimed Tools Appl 76(3):4651–4669.30
Choi H, Park H (2014) A hierarchical structure for gesture recognition using RGB-D sensor [C]//Proceedings of the second international conference on Human-agent interaction. ACM 265–268
Cirujeda P, Binefa X 4DCov: a nested covariance descriptor of spatio-temporal features for gesture recognition in depth sequences [C]//3D vision (3DV), 2014 2nd international conference on. IEEE 2014(1):657–664
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection [C]//computer vision and pattern recognition, 2005. CVPR 2005. IEEE computer Society conference on. IEEE (1):886–893
Derpanis KG, Sizintsev M, Cannons KJ et al (2013) Action spotting and recognition based on a spatiotemporal orientation analysis [J]. IEEE Trans Pattern Anal Mach Intell 35(3):527–540
Dominio F, Donadeo M, Zanuttigh P (2014) Combining multiple depth-based descriptors for hand gesture recognition [J]. Pattern Recogn Lett 50:101–111
El Madany N E D, He Y, Guan L (2015) Human action recognition using temporal hierarchical pyramid of depth motion map and keca [C]//Multimedia Signal Processing (MMSP), 2015 IEEE 17th International Workshop on. IEEE 1–6
Fan RE, Chang KW, Hsieh CJ et al (2008) LIBLINEAR: a library for large linear classification [J]. J Mach Learn Res 9:1871–1874
Gupta PMXYS, Kautz KKSTJ (2016) Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural networks [C]. CVPR
Hasan H, Abdul-Kareem S (2014) RETRACTED ARTICLE: static hand gesture recognition using neural networks [J]. Artif Intell Rev 41(2):147–181
Jiang M, Jin K, Kong J (2018) Action Recognition Using Multi-Temporal DMMs Based on Adaptive Vague Division [C]//Proceedings of the 2018 International Conference on Image and Graphics Processing. ACM 8–13
Kim Y, Toomajian B (2016) Hand gesture recognition using micro-Doppler signatures with convolutional neural network [J]. IEEE Access 4:7125–7130
Kraskov A, Stögbauer H, Grassberger P (2004) Estimating mutual information [J]. Phys Rev E 69(6):066138
Kurakin A, Zhang Z, Liu Z (2012) A real time system for dynamic hand gesture recognition with a depth sensor [C]//Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European. IEEE 1975–1979
Liang B, Zheng L (2015) Spatio-temporal pyramid cuboid matching for action recognition using depth maps [C]//Image Processing (ICIP), 2015 IEEE International Conference on. IEEE 2070–2074
Liu M, Liu H (2016) Depth context: a new descriptor for human activity recognition by using sole depth sequences [J]. Neurocomputing 175:747–758
Liu L, Shao L (2013) Learning Discriminative Representations from RGB-D Video Data [C]//IJCAI. 1: 3
Molchanov P, Gupta S, Kim K, et al. (2015) Hand gesture recognition with 3D convolutional neural networks [C]//Proceedings of the IEEE conference on computer vision and pattern recognition workshops 1–7
Nishida N, Nakayama H (2015) Multimodal gesture recognition using multi-stream recurrent neural network [C]//Pacific-rim symposium on image and video technology. Springer, Cham, pp 682–694
Ohn-Bar E, Trivedi M M (2013) Joint angles similarities and HOG2 for action recognition [C]//Computer vision and pattern recognition workshops (CVPRW), 2013 IEEE conference on. IEEE 465–470
Oreifej O, Liu Z (2013) Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences [C]//Computer vision and pattern recognition (CVPR), 2013 IEEE conference on. IEEE 716–723
Rahmani H, Mahmood A, Huynh DQ, Mian A (2014) Real time action recognition using histograms of depth gradients and random decision forests. In: Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on. IEEE 626–633
Santos DG, Fernandes BJT, Bezerra BLD (2015) HAGR-D: a novel approach for gesture recognition with depth maps [J]. Sensors 15(11):28646–28664
Shen X, Hua G, Williams L et al (2012) Dynamic hand gesture recognition: an exemplar-based approach from motion divergence fields [J]. Image Vis Comput 30(3):227–235
Tran Q D, Ly N Q (2013) Sparse spatio-temporal representation of joint shape-motion cues for human action recognition in depth sequences [C]//Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2013 IEEE RIVF International Conference on. IEEE 253–258
Tung P T, Ngoc L Q (2014) Elliptical density shape model for hand gesture recognition [C]//Proceedings of the Fifth Symposium on Information and Communication Technology. ACM 186–191
Wang X, Xia M, Cai H, et al. (2012) Hidden-markov-models-based dynamic hand gesture recognition [J]. Math Problems Eng
Wang J, Liu Z, Chorowski J et al (2012) Robust 3d action recognition with random occupancy patterns [M]//computer vision–ECCV 2012. Springer, Berlin, pp 872–885
Wang L, Xiong Y, Wang Z, et al (2017) Temporal Segment Networks for Action Recognition in Videos [J]. arXiv preprint arXiv:1705.02953
Wold S, Esbensen K, Geladi P (1987) Principal component analysis [J]. Chemom Intell Lab Syst 2(1–3):37–52
Yang X, Tian Y L (2014) Super normal vector for activity recognition using depth sequences [C]//Proceedings of the IEEE conference on computer vision and pattern recognition 804–811
Yang X, Zhang C, Tian Y L (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients [C]//Proceedings of the 20th ACM international conference on Multimedia. ACM 1057–1060
Yuan J, Liu Z, Wu Y (2011) Discriminative video pattern search for efficient action detection [J]. IEEE Trans Pattern Anal Mach Intell 33(9):1728–1743
Zhang C, Tian Y (2013) Edge enhanced depth motion map for dynamic hand gesture recognition [C]//Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on. IEEE 500–505
Zhang C, Yang X, Tian Y L (2013) Histogram of 3D facets: A characteristic descriptor for hand gesture recognition [C]//Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on. IEEE 1–8
Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions [J]. IEEE Trans Pattern Anal Mach Intell 29(6):915–928
Zheng J, Feng Z, Xu C et al (2017) Fusing shape and spatio-temporal features for depth-based dynamic hand gesture recognition [J]. Multimed Tools Appl 76(20):20525–20544
Zhu Y, Chen W, Guo G (2015) Fusing multiple features for depth-based action recognition [J]. ACM Trans Intel Syst Technol (TIST) 6(2):18
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xing, M., Hu, J., Feng, Z. et al. Dynamic hand gesture recognition using motion pattern and shape descriptors. Multimed Tools Appl 78, 10649–10672 (2019). https://doi.org/10.1007/s11042-018-6553-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6553-9