mXception and dynamic image for hand gesture recognition

Karsh, Bhumika; Laskar, Rabul Hussain; Karsh, Ram Kumar

doi:10.1007/s00521-024-09509-0

mXception and dynamic image for hand gesture recognition

Original Article
Published: 17 February 2024

Volume 36, pages 8281–8300, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

350 Accesses
4 Citations
Explore all metrics

Abstract

Gesture detection has recently attracted a lot of attention due to its wide range of applications, notably in human–computer interaction (HCI). However, when it comes to video-based gesture recognition, elements in the background unrelated to gestures slow down the system’s classification rate. This paper presents an algorithm designed for the recognition of large-scale gestures. In the training phase, we utilize RGB-D videos, where the depth modality videos are derived from RGB modality videos using UNET and subsequently employed for testing. However, it’s worth noting that in real-time applications of the proposed dynamic hand gesture recognition (DHGR) system, only RGB modality videos are needed. The algorithm begins by creating two dynamic images: one from the estimated depth video and the other from the RGB video. Dynamic images generated from RGB video excel in capturing spatial information; while, those derived from depth video excel in encoding temporal aspects. These two dynamic images are merged to form an RGB-D dynamic image (RDDI). The RDDI is then fed into a modified Xception-based CNN model for the purpose of gesture classification and recognition. In order to evaluate the system’s performance, we conducted experiments using the EgoGesture and MSR Gesture datasets. The results are highly promising, with a reported classification accuracy of 91.64% for the EgoGesture dataset and an impressive 99.41% for the MSR Gesture dataset. The results demonstrated that the suggested system outperformed some existing techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Encoded motion image-based dynamic hand gesture recognition

Article 09 August 2021

Towards an end-to-end isolated and continuous deep gesture recognition process

Article 06 April 2022

DATE: a video dataset and benchmark for dynamic hand gesture recognition

Article 07 June 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Mitra S, Acharya T (2007) Gesture recognition: a survey. IEEE Trans Syst Man Cybern Part C (Appl Rev) 37(3):311–324
Article Google Scholar
Hasan H, Abdul-Kareem S (2014) RETRACTED ARTICLE: human–computer interaction using vision-based hand gesture recognition systems: a survey. Neural Comput Appl 25(2):251–261
Article Google Scholar
Chang CC, Chen JJ, Tai WK, Han CC (2006) New approach for static gesture recognition. J Inf Sci Eng 22(5):1047–1057
Google Scholar
Köpüklü O, Gunduz A, Kose N, Rigoll G (2020) Online dynamic hand gesture recognition including efficiency analysis. IEEE Trans Biom Behav Identity Sci 2(2):85–97
Article Google Scholar
Pavlovic VI, Sharma R, Huang TS (1997) Visual interpretation of hand gestures for human–computer interaction: a review. IEEE Trans Pattern Anal Mach Intell 19(7):677–695
Article Google Scholar
Barbhuiya AA, Karsh RK, Jain R (2021) CNN based feature extraction and classification for sign language. Multimed Tools Appl 80(2):3051–3069
Article Google Scholar
Wang P, Li W, Ogunbona P, Wan J, Escalera S (2018) RGB-D-based human motion recognition with deep learning: a survey. Comput Vis Image Underst 171:118–139
Article Google Scholar
Mahony N, Campbell S, Carvalho A, Harapanahalli S, Hernandez GV, Krpalkova L, Walsh J (2019) Deep learning versus traditional computer vision. In: Science and information conference, Springer, pp 128–144
Al-Shamayleh AS, Ahmad R, Abushariah MA, Alam KA, Jomhari N (2018) A systematic literature review on vision based gesture recognition techniques. Multimed Tools Appl 77(21):28121–28184
Article Google Scholar
Ji S, Xu W, Yang M, Yu K (2012) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Article Google Scholar
Liu Z, Zhang C, Tian Y (2016) 3D-based deep convolutional neural network for action recognition with depth sequences. Image Vis Comput 55:93–100
Article Google Scholar
Bharti S, Balmik A, Nandy A (2023) Novel error correction-based key frame extraction technique for dynamic hand gesture recognition. Neural Comput Appl 35:1–16
Article Google Scholar
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634
Molchanov P, Yang X, Gupta S, Kim K, Tyree S, Kautz J (2016) Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4207–4215
Duan J, Wan J, Zhou S, Guo X, Li SZ (2018) A unified framework for multi-modal isolated gesture recognition. ACM Trans Multimed Comput Commun Appl (TOMM) 14(1s):1–16
Article Google Scholar
Narayana P, Beveridge R, Draper BA (2018) Gesture recognition: focus on the hands. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5235–5244
Elboushaki A, Hannane R, Afdel K, Koutti L (2020) MultiD-CNN: a multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences. Expert Syst Appl 139:112829
Article Google Scholar
Dos Santos CC, Samatelo JLA, Vassallo RF (2020) Dynamic gesture recognition by using CNNs and star RGB: a temporal information condensation. Neurocomputing 400:238–254
Article Google Scholar
Asadi-Aghbolaghi M, Clapes A, Bellantonio M, Escalante HJ, Ponce-López V, Baró X, Escalera S (2017) A survey on deep learning based approaches for action and gesture recognition in image sequences. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017), pp 476–483 (IEEE)
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
Cui J, Zhang H, Han H, Shan S, Chen X (2018) Improving 2D face recognition via discriminative face depth estimation. In: 2018 International Conference on Biometrics (ICB), pp 140–147 (IEEE)
Li G, Liu Z, Ling H (2020) ICNet: information conversion network for RGB-D based salient object detection. IEEE Trans Image Process 29:4873–4884
Article Google Scholar
Caglayan A, Burak Can A (2018) Exploiting multi-layer features using a CNN-RNN approach for RGB-D object recognition. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops
Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM international conference on Multimedia, pp 1057–1060
Wang P, Li W, Liu S, Zhang Y, Gao Z, Ogunbona P (2016) Large-scale continuous gesture recognition using convolutional neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp 13–18 (IEEE)
Wang P, Li W, Gao Z, Tang C, Ogunbona PO (2018) Depth pooling based large-scale 3-D action recognition with convolutional neural networks. IEEE Trans Multimed 20(5):1051–1061
Article Google Scholar
Neverova N, Wolf C, Taylor G, Nebout F (2015) Moddrop: adaptive multi-modal gesture recognition. IEEE Trans Pattern Anal Mach Intell 38(8):1692–1706
Article Google Scholar
Ijjina EP, Chalavadi KM (2017) Human action recognition in RGB-D videos using motion sequence information and deep learning. Pattern Recogn 72:504–516
Article Google Scholar
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015). Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497
Tang X, Yan Z, Peng J, Hao B, Wang H, Li J (2021) Selective spatiotemporal features learning for dynamic gesture recognition. Expert Syst Appl 169:114499
Article Google Scholar
Cao Z, Li Y, Shin BS (2022) Content-Adaptive and attention-based network for hand gesture recognition. Appl Sci 12(4):2041
Article Google Scholar
Yu Z, Zhou B, Wan J, Wang P, Chen H, Liu X, Zhao G (2021) Searching multi-rate and multi-modal temporal enhanced networks for gesture recognition. IEEE Trans Image Process 30:5626–5640
Article Google Scholar
Jain R, Karsh RK, Barbhuiya AA (2022) Encoded motion image-based dynamic hand gesture recognition. Vis Comput 38(6):1957–1974
Article Google Scholar
Kantor IL, Solodovnikov AS, Shenitzer A (1989) Hypercomplex numbers: an elementary introduction to algebras, vol 302. Springer, New York
Book Google Scholar
Yadav KS, Laskar RH, Ahmad N (2023) Exploration of deep learning models for localizing bare-hand in the practical environment. Eng Appl Artif Intell 123:106253
Article Google Scholar
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Bao P, Maqueda AI, del Blanco CR, García N (2017) Tiny hand gesture recognition without localization via a deep convolutional network. IEEE Trans Consum Electron 63(3):251–257
Article Google Scholar
Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 53(8):5455–5516
Article Google Scholar
Zhang Y, Cao C, Cheng J, Lu H (2018) EgoGesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans Multimed 20(5):1038–1050
Article Google Scholar
Kurakin A, Zhang Z, Liu Z (2012) A real time system for dynamic hand gesture recognition with a depth sensor. In: 2012 Proceedings of the 20th European signal processing conference (EUSIPCO), pp 1975–1979 (IEEE)
Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw 12(1):145–151
Article Google Scholar
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7):2121
MathSciNet Google Scholar
Zou F, Shen L, Jie Z, Zhang W, Liu W (2019) A sufficient condition for convergences of adam and rmsprop. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 11127–11135
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, vol 14, No 2, pp 1137–1145
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp 3551–3558
Cao C, Zhang Y, Wu Y, Lu H, Cheng J (2017) Egocentric gesture recognition using recurrent 3D convolutional neural networks with spatiotemporal transformer modules. In: Proceedings of the IEEE international conference on computer vision, pp 3763–3771
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Wang Y, Zhu A, Ma H, Ai L, Song W, Zhang S (2023) 3D-shufflevit: an efficient video action recognition network with deep integration of self-attention and convolution. Mathematics 11(18):3848
Article Google Scholar
Azad R, Asadi-Aghbolaghi M, Kasaei S, Escalera S (2018) Dynamic 3D hand gesture recognition by learning weighted depth motion maps. IEEE Trans Circuits Syst Video Technol 29(6):1729–1740
Article Google Scholar
Yang R, Yang R (2014) DMM-pyramid based deep architectures for action recognition with depth cameras. In: Asian Conference on Computer Vision, Springer, pp 37–49
Viet VH, Phuc NTT, Hoang PM, Nghia LK (2018) Spatial-temporal shape and motion features for dynamic hand gesture recognition in depth video. Int J Image Graph Signal Process. https://doi.org/10.5815/ijigsp.2018.09.03
Article Google Scholar
Bulbul MF, Islam S, Azme Z, Pareek P, Kabir MH, Ali H (2022) Enhancing the performance of 3D auto-correlation gradient features in depth action classification. Int J Multimed Inf Retr 11:1–16
Google Scholar
Weiyao X, Muqing W, Min Z, Yifeng L, Bo L, Ting X (2019) Human action recognition using multilevel depth motion maps. IEEE Access 7:41811–41822
Article Google Scholar

Download references

Author information

Authors and Affiliations

Speech and Image Processing Laboratory, Electronics and Communication Engineering Department, National Institute of Technology, Silchar, Assam, 788010, India
Bhumika Karsh, Rabul Hussain Laskar & Ram Kumar Karsh

Authors

Bhumika Karsh
View author publications
You can also search for this author inPubMed Google Scholar
Rabul Hussain Laskar
View author publications
You can also search for this author inPubMed Google Scholar
Ram Kumar Karsh
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Bhumika Karsh.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Karsh, B., Laskar, R.H. & Karsh, R.K. mXception and dynamic image for hand gesture recognition. Neural Comput & Applic 36, 8281–8300 (2024). https://doi.org/10.1007/s00521-024-09509-0

Download citation

Received: 15 December 2022
Accepted: 14 January 2024
Published: 17 February 2024
Issue Date: May 2024
DOI: https://doi.org/10.1007/s00521-024-09509-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

mXception and dynamic image for hand gesture recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Encoded motion image-based dynamic hand gesture recognition

Towards an end-to-end isolated and continuous deep gesture recognition process

DATE: a video dataset and benchmark for dynamic hand gesture recognition

Explore related subjects

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now