Abstract
Gesture detection has recently attracted a lot of attention due to its wide range of applications, notably in human–computer interaction (HCI). However, when it comes to video-based gesture recognition, elements in the background unrelated to gestures slow down the system’s classification rate. This paper presents an algorithm designed for the recognition of large-scale gestures. In the training phase, we utilize RGB-D videos, where the depth modality videos are derived from RGB modality videos using UNET and subsequently employed for testing. However, it’s worth noting that in real-time applications of the proposed dynamic hand gesture recognition (DHGR) system, only RGB modality videos are needed. The algorithm begins by creating two dynamic images: one from the estimated depth video and the other from the RGB video. Dynamic images generated from RGB video excel in capturing spatial information; while, those derived from depth video excel in encoding temporal aspects. These two dynamic images are merged to form an RGB-D dynamic image (RDDI). The RDDI is then fed into a modified Xception-based CNN model for the purpose of gesture classification and recognition. In order to evaluate the system’s performance, we conducted experiments using the EgoGesture and MSR Gesture datasets. The results are highly promising, with a reported classification accuracy of 91.64% for the EgoGesture dataset and an impressive 99.41% for the MSR Gesture dataset. The results demonstrated that the suggested system outperformed some existing techniques.













Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Mitra S, Acharya T (2007) Gesture recognition: a survey. IEEE Trans Syst Man Cybern Part C (Appl Rev) 37(3):311–324
Hasan H, Abdul-Kareem S (2014) RETRACTED ARTICLE: human–computer interaction using vision-based hand gesture recognition systems: a survey. Neural Comput Appl 25(2):251–261
Chang CC, Chen JJ, Tai WK, Han CC (2006) New approach for static gesture recognition. J Inf Sci Eng 22(5):1047–1057
Köpüklü O, Gunduz A, Kose N, Rigoll G (2020) Online dynamic hand gesture recognition including efficiency analysis. IEEE Trans Biom Behav Identity Sci 2(2):85–97
Pavlovic VI, Sharma R, Huang TS (1997) Visual interpretation of hand gestures for human–computer interaction: a review. IEEE Trans Pattern Anal Mach Intell 19(7):677–695
Barbhuiya AA, Karsh RK, Jain R (2021) CNN based feature extraction and classification for sign language. Multimed Tools Appl 80(2):3051–3069
Wang P, Li W, Ogunbona P, Wan J, Escalera S (2018) RGB-D-based human motion recognition with deep learning: a survey. Comput Vis Image Underst 171:118–139
Mahony N, Campbell S, Carvalho A, Harapanahalli S, Hernandez GV, Krpalkova L, Walsh J (2019) Deep learning versus traditional computer vision. In: Science and information conference, Springer, pp 128–144
Al-Shamayleh AS, Ahmad R, Abushariah MA, Alam KA, Jomhari N (2018) A systematic literature review on vision based gesture recognition techniques. Multimed Tools Appl 77(21):28121–28184
Ji S, Xu W, Yang M, Yu K (2012) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Liu Z, Zhang C, Tian Y (2016) 3D-based deep convolutional neural network for action recognition with depth sequences. Image Vis Comput 55:93–100
Bharti S, Balmik A, Nandy A (2023) Novel error correction-based key frame extraction technique for dynamic hand gesture recognition. Neural Comput Appl 35:1–16
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634
Molchanov P, Yang X, Gupta S, Kim K, Tyree S, Kautz J (2016) Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4207–4215
Duan J, Wan J, Zhou S, Guo X, Li SZ (2018) A unified framework for multi-modal isolated gesture recognition. ACM Trans Multimed Comput Commun Appl (TOMM) 14(1s):1–16
Narayana P, Beveridge R, Draper BA (2018) Gesture recognition: focus on the hands. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5235–5244
Elboushaki A, Hannane R, Afdel K, Koutti L (2020) MultiD-CNN: a multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences. Expert Syst Appl 139:112829
Dos Santos CC, Samatelo JLA, Vassallo RF (2020) Dynamic gesture recognition by using CNNs and star RGB: a temporal information condensation. Neurocomputing 400:238–254
Asadi-Aghbolaghi M, Clapes A, Bellantonio M, Escalante HJ, Ponce-López V, Baró X, Escalera S (2017) A survey on deep learning based approaches for action and gesture recognition in image sequences. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017), pp 476–483 (IEEE)
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Cui J, Zhang H, Han H, Shan S, Chen X (2018) Improving 2D face recognition via discriminative face depth estimation. In: 2018 International Conference on Biometrics (ICB), pp 140–147 (IEEE)
Li G, Liu Z, Ling H (2020) ICNet: information conversion network for RGB-D based salient object detection. IEEE Trans Image Process 29:4873–4884
Caglayan A, Burak Can A (2018) Exploiting multi-layer features using a CNN-RNN approach for RGB-D object recognition. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops
Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM international conference on Multimedia, pp 1057–1060
Wang P, Li W, Liu S, Zhang Y, Gao Z, Ogunbona P (2016) Large-scale continuous gesture recognition using convolutional neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp 13–18 (IEEE)
Wang P, Li W, Gao Z, Tang C, Ogunbona PO (2018) Depth pooling based large-scale 3-D action recognition with convolutional neural networks. IEEE Trans Multimed 20(5):1051–1061
Neverova N, Wolf C, Taylor G, Nebout F (2015) Moddrop: adaptive multi-modal gesture recognition. IEEE Trans Pattern Anal Mach Intell 38(8):1692–1706
Ijjina EP, Chalavadi KM (2017) Human action recognition in RGB-D videos using motion sequence information and deep learning. Pattern Recogn 72:504–516
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015). Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497
Tang X, Yan Z, Peng J, Hao B, Wang H, Li J (2021) Selective spatiotemporal features learning for dynamic gesture recognition. Expert Syst Appl 169:114499
Cao Z, Li Y, Shin BS (2022) Content-Adaptive and attention-based network for hand gesture recognition. Appl Sci 12(4):2041
Yu Z, Zhou B, Wan J, Wang P, Chen H, Liu X, Zhao G (2021) Searching multi-rate and multi-modal temporal enhanced networks for gesture recognition. IEEE Trans Image Process 30:5626–5640
Jain R, Karsh RK, Barbhuiya AA (2022) Encoded motion image-based dynamic hand gesture recognition. Vis Comput 38(6):1957–1974
Kantor IL, Solodovnikov AS, Shenitzer A (1989) Hypercomplex numbers: an elementary introduction to algebras, vol 302. Springer, New York
Yadav KS, Laskar RH, Ahmad N (2023) Exploration of deep learning models for localizing bare-hand in the practical environment. Eng Appl Artif Intell 123:106253
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Bao P, Maqueda AI, del Blanco CR, García N (2017) Tiny hand gesture recognition without localization via a deep convolutional network. IEEE Trans Consum Electron 63(3):251–257
Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 53(8):5455–5516
Zhang Y, Cao C, Cheng J, Lu H (2018) EgoGesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans Multimed 20(5):1038–1050
Kurakin A, Zhang Z, Liu Z (2012) A real time system for dynamic hand gesture recognition with a depth sensor. In: 2012 Proceedings of the 20th European signal processing conference (EUSIPCO), pp 1975–1979 (IEEE)
Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw 12(1):145–151
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7):2121
Zou F, Shen L, Jie Z, Zhang W, Liu W (2019) A sufficient condition for convergences of adam and rmsprop. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 11127–11135
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, vol 14, No 2, pp 1137–1145
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp 3551–3558
Cao C, Zhang Y, Wu Y, Lu H, Cheng J (2017) Egocentric gesture recognition using recurrent 3D convolutional neural networks with spatiotemporal transformer modules. In: Proceedings of the IEEE international conference on computer vision, pp 3763–3771
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Wang Y, Zhu A, Ma H, Ai L, Song W, Zhang S (2023) 3D-shufflevit: an efficient video action recognition network with deep integration of self-attention and convolution. Mathematics 11(18):3848
Azad R, Asadi-Aghbolaghi M, Kasaei S, Escalera S (2018) Dynamic 3D hand gesture recognition by learning weighted depth motion maps. IEEE Trans Circuits Syst Video Technol 29(6):1729–1740
Yang R, Yang R (2014) DMM-pyramid based deep architectures for action recognition with depth cameras. In: Asian Conference on Computer Vision, Springer, pp 37–49
Viet VH, Phuc NTT, Hoang PM, Nghia LK (2018) Spatial-temporal shape and motion features for dynamic hand gesture recognition in depth video. Int J Image Graph Signal Process. https://doi.org/10.5815/ijigsp.2018.09.03
Bulbul MF, Islam S, Azme Z, Pareek P, Kabir MH, Ali H (2022) Enhancing the performance of 3D auto-correlation gradient features in depth action classification. Int J Multimed Inf Retr 11:1–16
Weiyao X, Muqing W, Min Z, Yifeng L, Bo L, Ting X (2019) Human action recognition using multilevel depth motion maps. IEEE Access 7:41811–41822
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Karsh, B., Laskar, R.H. & Karsh, R.K. mXception and dynamic image for hand gesture recognition. Neural Comput & Applic 36, 8281–8300 (2024). https://doi.org/10.1007/s00521-024-09509-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-09509-0