Spatial-temporal dynamic hand gesture recognition via hybrid deep learning model

Li, Jinghua; Huai, Huarui; Gao, Junbin; Kong, Dehui; Wang, Lichun

doi:10.1007/s12193-019-00304-z

Spatial-temporal dynamic hand gesture recognition via hybrid deep learning model

Original Paper
Published: 14 May 2019

Volume 13, pages 363–371, (2019)
Cite this article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Jinghua Li ORCID: orcid.org/0000-0002-5583-8260¹,
Huarui Huai¹,
Junbin Gao²,
Dehui Kong¹ &
…
Lichun Wang¹

596 Accesses
8 Citations
6 Altmetric
Explore all metrics

Abstract

Hand gesture is a kind of natural interaction way and hand gesture recognition has recently become more and more popular in human–computer interaction. However, the complexity and variations of hand gesture like various illuminations, views, and self-structural characteristics make the hand gesture recognition still challengeable. How to design an appropriate feature representation and classifier are the core problems. To this end, this paper develops an expressive deep hybrid hand gesture recognition architecture called CNN-MVRBM-NN. The framework consists of three submodels. The CNN submodel automatically extracts frame-level spatial features, and the MVRBM submodel fuses spatial information over time for training higher level semantics inherent in hand gesture, while the NN submodel classifies hand gesture, which is initialized by MVRBM for second order data representation, and then such NN pre-trained by MVRBM can be fine-tuned by back propagation so as to be more discriminative. The experimental results on Cambridge Hand Gesture Data set show the proposed hybrid CNN-MVRBM-NN has obtained the state-of-the-art recognition performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

HandSense: smart multimodal hand gesture recognition based on deep neural networks

Article 23 August 2018

mIV3Net: modified inception V3 network for hand gesture recognition

Article 23 June 2023

Dynamic Gesture Recognition Based on Deep 3D Natural Networks

Article 24 July 2023

References

Chen Fs F, Cm HC (2003) Hand gesture recognition using a real-time tracking method and hidden markov models. Image Vis Comput 21(8):745–758
Article Google Scholar
Auephanwiriyakul S, Phitakwinai S (2013) Thai sign language translation using scale invariant feature transform and hidden markov models. Pattern Recognit Lett 34(11):1291–1298
Article Google Scholar
Lukas P, Yuji O, Yoshihiko M, Hiroshi I (2014) A hog-based hand gesture recognition system on a mobile device. In ICIP, pp 3973–3977
Ss R, Agrawal A (2015) Vision based hand gesture recognition for human computer interaction: a survey. Artif Intell Rev 43:1–54
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. NIPS 25:1097–1105
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Comput Sci, pp 1409–1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S et al (2015) Going deeper with convolutions. In: CVPR, pp 1–9
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: NIPS, pp 568–576
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp 580–587
Ji SW, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Article Google Scholar
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014)Large-scale video classification with convolutional neural networks. In: CVPR, pp 1725–1732
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5789):504–507
Article MathSciNet Google Scholar
Qi GL, Sun YF, Gao JB, Hu YL, Li JH (2016) Matrix variate rbm and its applications. In: IJCNN, pp 389–395
Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden markov model. In: CVPR, pp 379–385
Ramamoorthy A, Vaswani N, Chaudhury S, Banerjee S (2003) Recognition of dynamic hand gestures. Pattern Recognit 36(9):2069–2081
Article Google Scholar
Liang B, Zheng L (2014) Multi-modal gesture recognition using skeletal joints and motion trail model. In: European conference on computer vision, pp 623–638
Chapter Google Scholar
Xy WANG, Gz DAI, Xw ZHANG, Fj Z (2008) Recognition of complex dynamic gesture based on HMM-FNN model. J Softw 19(9):2302–2312 (in Chinese)
Article Google Scholar
Singha J, Hussain RL (2016) Recognition of global hand gestures using self co-articulation information and classifier fusion. J Multimodal User Interfaces 10(1):77–93
Article Google Scholar
Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge
Book Google Scholar
Nguyen TD, Tran T, Phung D, Venkatesh S (2015) Tensor variate restricted Boltzmann machines. In: AAAI, pp 2887–2893
Sun Y, Wang X, Tang X (2015) Deeply learned face representations are sparse, selective, and robust. In: CVPR, pp 2892–2900
Kim TK, Cipolla R (2009) Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Trans Pattern Anal Mach Intell 31(8):1415–1428
Article Google Scholar
Lui YM (2012) Human gesture recognition on product manifolds. J Mach Learn Res 13(1):3297–3321
MathSciNet MATH Google Scholar
Shen CH, Harandi M, Hartley R (2015) Extrinsic methods for coding and dictionary learning on Grassmann manifolds. Int J Comput Vis 114(2):113–136
MathSciNet MATH Google Scholar
Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698
Article Google Scholar

Download references

Acknowledgements

This research is supported by the Natural Science Foundation of China (Nos. 61402024, 61772049, 61632006, 61876012), Beijing Natural Science Foundation (Nos. 4162009), Beijing Educational Committee(No. KM201710005022), Beijing Key Laboratory of Computational Intelligence and Intelligent System. We would also like to thank PhD candidate Wentong Wang for his genuine suggestion in CNN program design.

Author information

Authors and Affiliations

Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
Jinghua Li, Huarui Huai, Dehui Kong & Lichun Wang
Discipline of Business Analytics, The University of Sydney Business School, The University of Sydney, Sydney, NSW, 2006, Australia
Junbin Gao

Authors

Jinghua Li
View author publications
You can also search for this author in PubMed Google Scholar
Huarui Huai
View author publications
You can also search for this author in PubMed Google Scholar
Junbin Gao
View author publications
You can also search for this author in PubMed Google Scholar
Dehui Kong
View author publications
You can also search for this author in PubMed Google Scholar
Lichun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinghua Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Huai, H., Gao, J. et al. Spatial-temporal dynamic hand gesture recognition via hybrid deep learning model. J Multimodal User Interfaces 13, 363–371 (2019). https://doi.org/10.1007/s12193-019-00304-z

Download citation

Received: 03 August 2017
Accepted: 06 May 2019
Published: 14 May 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s12193-019-00304-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spatial-temporal dynamic hand gesture recognition via hybrid deep learning model

Abstract

Access this article

Similar content being viewed by others

HandSense: smart multimodal hand gesture recognition based on deep neural networks

mIV3Net: modified inception V3 network for hand gesture recognition

Dynamic Gesture Recognition Based on Deep 3D Natural Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spatial-temporal dynamic hand gesture recognition via hybrid deep learning model

Abstract

Access this article

Similar content being viewed by others

HandSense: smart multimodal hand gesture recognition based on deep neural networks

mIV3Net: modified inception V3 network for hand gesture recognition

Dynamic Gesture Recognition Based on Deep 3D Natural Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation