Skip to main content
Log in

HandSense: smart multimodal hand gesture recognition based on deep neural networks

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Hand gesture recognition (HGR) is a promising enabler for human–computer interaction (HCI). Hand gestures are normally classified into multi-modal actions, including static gestures, fine-grained dynamic gestures, and coarse-grained dynamic gestures. Among them, the fine-grained action detection is limited under the small-scale image region condition. To solve this problem, we propose the HandSense, a new system for the multi-modal HGR based on a combined RGB and depth cameras to improve the fine-grained action descriptors as well as preserve the ability to perform general action recognition. First of all, two interconnected 3D convolutional neural network (3D-CNN) are employed to extract the spatial–temporal features from the RGB and depth images. Second, these spatial–temporal features are integrated into a fusion feature. Finally, the Support Vector Machine (SVM) is used to recognize different gestures based on the fusion feature. To validate the effectiveness of the HandSense, the extensive experiments are conducted on the public gesture dataset, namely the SKIG hand gesture dataset. In addition, the feasibility of the proposed system is also demonstrated by using a challenging multi-modal RGB-Depth hand gesture dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

References

  • Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: International workshop on human behavior understanding. Springer, pp 29–39

  • Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intel Syst Technol (TIST) 2(3):27

    Google Scholar 

  • Chung S, Park C, Suh S, Kang K, Choo J, Kwon BC (2016) Re-vacnn: Steering convolutional neural network via real-time visual analytics. In: Future of interactive learning machines workshop at the 30th annual conference on neural information processing systems (NIPS)

  • Ge L, Liang H, Yuan J, Thalmann D (2016) Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3593–3601

  • Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:12070580

  • Hu M, Shen F, Zhao J (2014) Hidden markov models based dynamic hand gesture recognition with incremental learning method. In: 2014 international joint conference on neural networks (IJCNN), IEEE, pp 3108–3115

  • Jahn G, Krems JF, Gelau C (2009) Skill acquisition while operating in-vehicle information systems: interface design determines the level of safety-relevant distractions. Hum Factors 51(2):136–151

    Article  PubMed  Google Scholar 

  • Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231

    Article  PubMed  Google Scholar 

  • Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 133–142

  • Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1725–1732

  • Klaser A, Marszałek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: BMVC 2008-19th British machine vision conference, British machine vision association, pp 1–10

  • Kojima S, Ohyama W, Wakabayashi T (2017) Gesture recognition based on spatiotemporal histogram of oriented gradient variation. In: Informatics, electronics and vision and 2017 7th international symposium in computational medical and health technology (ICIEV-ISCMHT), IEEE, pp 1–4

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems 25, Curran Associates, Inc, pp 1097–1105

  • Li Y (2012) Hand gesture recognition using kinect. In: 2012 IEEE 3rd international conference on software engineering and service science (ICSESS), IEEE, pp 196–199

  • Liu K, Kehtarnavaz N (2016) Real-time robust vision-based hand gesture recognition using stereo images. J Real-Time Image Proc 11(1):201–209

    Article  Google Scholar 

  • Liu L, Shao L (2013) Learning discriminative representations from rgb-d video data. In: IJCAI, vol 1, p 3

  • Liu WM, Wang LH (2011) The soccer robot the auto-adapted threshold value method based on hsi and rgb. In: 2011 International Conference on Intelligent computation technology and automation (ICICTA), IEEE, vol 1, pp 283–286

  • Ma M, Marturi N, Li Y, Leonardis A, Stolkin R (2018) Region-sequence based six-stream cnn features for general and fine-grained human action recognition in videos. Pattern Recogn 76:506–521

    Article  ADS  Google Scholar 

  • Molchanov P, Gupta S, Kim K, Kautz J (2015) Hand gesture recognition with 3d convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1–7

  • Pal DH, Kakade S (2016) Dynamic hand gesture recognition using kinect sensor. In: 2016 international conference on global trends in signal processing, information computing and communication (ICGTSPICC), IEEE, pp 448–453

  • Parada-Loira F, González-Agulla E, Alba-Castro JL (2014) Hand gestures to control infotainment equipment in cars. In: IEEE Intelligent Vehicles Symposium Proceedings. IEEE, pp 1–6

  • Platt JC (1999) 12 fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods, pp 185–208

  • Prakash RM, Deepa T, Gunasundari T, Kasthuri N (2017) Gesture recognition and finger tip detection for human computer interaction. In: 2017 international conference on innovations in information, embedded and communication systems (ICIIECS), IEEE, pp 1–4

  • Priyal SP, Bora PK (2013) A robust static hand gesture recognition system using geometry based normalizations and krawtchouk moments. Pattern Recogn 46(8):2202–2219

    Article  ADS  Google Scholar 

  • Rao GA, Syamala K, Kishore P, Sastry A (2018) Deep convolutional neural networks for sign language recognition. In: 2018 conference on signal processing and communication engineering systems (SPACES), IEEE, pp 194–197

  • Rohrbach M, Rohrbach A, Regneri M, Amin S, Andriluka M, Pinkal M, Schiele B (2016) Recognizing fine-grained and composite activities using hand-centric features and script data. Int J Comput Vision 119(3):346–373

    Article  MathSciNet  Google Scholar 

  • Sharp T, Keskin C, Robertson D, Taylor J, Shotton J, Kim D, Rhemann C, Leichter I, Vinnikov A, Wei Y, et al. (2015) Accurate, robust, and flexible real-time hand tracking. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems, ACM, pp 3633–3642

  • Simonyan K, Zisserman A (2014a) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576

  • Simonyan K, Zisserman A (2014b) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556

  • Simonyan K, Vedaldi A, Zisserman A (2013) Deep inside convolutional networks: Visualising image classification models and saliency maps. 2013. arXiv preprint arXiv:13126034

  • Singh G, Nelson A, Robucci R, Patel C, Banerjee N (2015) Inviz: Low-power personalized gesture recognition using wearable textile capacitive sensor arrays. In: 2015 IEEE international conference on pervasive computing and communications (PerCom), IEEE, pp 198–206

  • Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning, pp 1139–1147

  • Taylor GW, Fergus R, LeCun Y, Bregler C (2010) Convolutional learning of spatio-temporal features. In: European conference on computer vision. Springer, Berlin, pp 140–153

  • Tsai CY, Lee YH (2011) The parameters effect on performance in ann for hand gesture recognition system. Expert Syst Appl 38(7):7980–7983

    Article  Google Scholar 

  • Vieriu RL, Goraş B, Goraş L (2011) On hmm static hand gesture recognition. In: 2011 10th international symposium on signals, circuits and systems (ISSCS), IEEE, pp 1–4

  • Wang X, Xia M, Cai H, Gao Y, Cattani C (2012) Hidden–Markov-models-based dynamic hand gesture recognition. Math Probl Eng 2012:986134

    MathSciNet  Google Scholar 

  • Wen H, Ramos Rojas J, Dey AK (2016) Serendipity: Finger gesture recognition using an off-the-shelf smartwatch. In: Proceedings of the 2016 CHI conference on human factors in computing systems, ACM, pp 3847–3851

  • Xue Y, Ju Z, Xiang K, Chen J, Liu H (2018) Multimodal human hand motion sensing and analysis-a review. In: IEEE Transactions on Cognitive and Developmental Systems. IEEE, pp 1–14

  • Yamada K, Yoshida T, Sumi K, Habe H, Mitsugami I (2017) Spatial and temporal segmented dense trajectories for gesture recognition. In: Thirteenth international conference on quality control by artificial vision 2017, International society for optics and photonics, vol 10338, p 103380F

  • Zhao Y, Luo Z, Quan C (2017) Unsupervised online learning for fine-grained hand segmentation in egocentric video. In: 2017 14th conference on computer and robot vision (CRV), IEEE, pp 248–255

Download references

Acknowledgements

Many thanks are given to the reviewers for the careful review and valuable suggestions. This work was supported in part by the National Natural Science Foundation of China (61301126, 61471077), Program for Changjiang Scholars and Innovative Research Team in University (IRT1299), Special Fund of Chongqing Key Laboratory (CSTC), Fundamental and Frontier Research Project of Chongqing (cstc2017jcyjAX0380, cstc2015jcyjBX0065), and University Outstanding Achievement Transformation Project of Chongqing (KJZH17117).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhenyuan Zhang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Tian, Z. & Zhou, M. HandSense: smart multimodal hand gesture recognition based on deep neural networks. J Ambient Intell Human Comput 15, 1557–1572 (2024). https://doi.org/10.1007/s12652-018-0989-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-018-0989-7

Keywords

Navigation