Skip to main content

Unsupervised Detection of Dynamic Hand Gestures from Leap Motion Data

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13231))

Abstract

The effective and reliable detection and classification of dynamic hand gestures is a key element for building Natural User Interfaces, systems that allow the users to interact using free movements of their body instead of traditional mechanical tools. However, methods that temporally segment and classify dynamic gestures usually rely on a great amount of labeled data, including annotations regarding the class and the temporal segmentation of each gesture. In this paper, we propose an unsupervised approach to train a Transformer-based architecture that learns to detect dynamic hand gestures in a continuous temporal sequence. The input data is represented by the 3D position of the hand joints, along with their speed and acceleration, collected through a Leap Motion device. Experimental results show a promising accuracy on both the detection and the classification task and that only limited computational power is required, confirming that the proposed method can be applied in real-world applications.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://aimagelab.ing.unimore.it/go/unsupervised-gesture-segmentation.

References

  1. Bala, A., Kumar, A., Birla, N.: Voice command recognition system based on MFCC and DTW. Int. J. Eng. Sci. Technol. 2(12), 7335–7342 (2010)

    Google Scholar 

  2. Borghi, G., Vezzani, R., Cucchiara, R.: Fast gesture recognition with multiple stream discrete HMMs on 3D skeletons. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 997–1002. IEEE (2016)

    Google Scholar 

  3. Boukhayma, A., Bem, R.d., Torr, P.H.: 3D hand shape and pose from images in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10843–10852 (2019)

    Google Scholar 

  4. Brenon, A., Portet, F., Vacher, M.: Preliminary study of adaptive decision-making system for vocal command in smart home. In: 2016 12th International Conference on Intelligent Environments (IE), pp. 218–221. IEEE (2016)

    Google Scholar 

  5. Caputo, F.M., et al.: Online gesture recognition. In: Eurographics Workshop on 3D Object Retrieval. The Eurographics Association (2019)

    Google Scholar 

  6. Chen, F.S., Fu, C.M., Huang, C.L.: Hand gesture recognition using a real-time tracking method and hidden Markov models. Image Vis. Comput. 21(8), 745–758 (2003)

    Article  Google Scholar 

  7. Cheng, W., Sun, Y., Li, G., Jiang, G., Liu, H.: Jointly network: a network based on CNN and RBM for gesture recognition. Neural Comput. Appl. 31(1), 309–323 (2019)

    Article  Google Scholar 

  8. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

  9. Darrell, T.J., Essa, I.A., Pentland, A.P.: Task-specific gesture analysis in real-time using interpolated views. IEEE Trans. Pattern Anal. Mach. Intell. 18(12), 1236–1242 (1996)

    Article  Google Scholar 

  10. D’Eusanio, A., Simoni, A., Pini, S., Borghi, G., Vezzani, R., Cucchiara, R.: A transformer-based network for dynamic hand gesture recognition. In: International Conference on 3D Vision (2020)

    Google Scholar 

  11. D’Eusanio, A., Simoni, A., Pini, S., Borghi, G., Vezzani, R., Cucchiara, R.: Multimodal hand gesture classification for the human-car interaction. Informatics 7, 31 (2020). Multidisciplinary Digital Publishing Institute

    Google Scholar 

  12. Feng, K.p., Yuan, F.: Static hand gesture recognition based on hog characters and support vector machines. In: 2013 2nd International Symposium on Instrumentation and Measurement, Sensor Network and Automation (IMSNA), pp. 936–938. IEEE (2013)

    Google Scholar 

  13. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)

    Google Scholar 

  14. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  15. Hu, Y., Wong, Y., Wei, W., Du, Y., Kankanhalli, M., Geng, W.: A novel attention-based hybrid CNN-RNN architecture for sEMG-based gesture recognition. PLoS ONE 13(10), e0206049 (2018)

    Google Scholar 

  16. Kahol, K., Tripathi, P., Panchanathan, S.: Automated gesture segmentation from dance sequences. In: Sixth IEEE International Conference on Automatic Face and Gesture Recognition 2004, Proceedings, pp. 883–888. IEEE (2004)

    Google Scholar 

  17. Kang, H., Lee, C.W., Jung, K.: Recognition-based gesture spotting in video games. Pattern Recogn. Lett. 25(15), 1701–1714 (2004)

    Article  Google Scholar 

  18. Ki, J., Kwon, Y.M.: 3D gaze estimation and interaction. In: 2008 3DTV Conference: The True Vision-Capture, Transmission and Display of 3D Video, pp. 373–376. IEEE (2008)

    Google Scholar 

  19. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  20. Lander, C., Gehring, S., Krüger, A., Boring, S., Bulling, A.: GazeProjector: accurate gaze estimation and seamless gaze interaction across multiple displays. In: Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, pp. 395–404 (2015)

    Google Scholar 

  21. Leap Motion. https://www.ultraleap.com/product/leap-motion-controller. Accessed 09 Dec 2021

  22. Lee, H.K., Kim, J.H.: An HMM-based threshold model approach for gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 21(10), 961–973, e0206049 (1999)

    Google Scholar 

  23. Lefebvre, G., Berlemont, S., Mamalet, F., Garcia, C.: BLSTM-RNN based 3D gesture classification. In: Mladenov, V., Koprinkova-Hristova, P., Palm, G., Villa, A.E.P., Appollini, B., Kasabov, N. (eds.) ICANN 2013. LNCS, vol. 8131, pp. 381–388. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40728-4_48

  24. Li, R., Liu, Z., Tan, J.: A survey on 3D hand pose estimation: cameras, methods, and datasets. Pattern Recogn. 93, 251–272 (2019)

    Google Scholar 

  25. Liu, J., Liu, Y., Wang, Y., Prinet, V., Xiang, S., Pan, C.: Decoupled representation learning for skeleton-based gesture recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5751–5760 (2020)

    Google Scholar 

  26. Liu, W.: Natural user interface-next mainstream product user interface. In: 2010 IEEE 11th International Conference on Computer-Aided Industrial Design & Conceptual Design, vol. 1, pp. 203–205. IEEE (2010)

    Google Scholar 

  27. Manganaro, F., Pini, S., Borghi, G., Vezzani, R., Cucchiara, R.: Hand gestures for the human-car interaction: the Briareo dataset. In: Ricci, E., Rota Bulò, S., Snoek, C., Lanz, O., Messelodi, S., Sebe, N. (eds.) ICIAP 2019. LNCS, vol. 11752, pp. 560–571. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30645-8_51

  28. Oka, R.: Spotting method for classification of real world data. Comput. J. 41(8), 559–565 (1998)

    Article  Google Scholar 

  29. Park, G., Kim, T.K., Woo, W.: 3D hand pose estimation with a single infrared camera via domain transfer learning. In: 2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 588–599. IEEE (2020)

    Google Scholar 

  30. Paszke, A., et al.: Automatic differentiation in PyTorch. In: Neural Information Processing Systems Workshops (2017)

    Google Scholar 

  31. Pavlovic, V.I., Sharma, R., Huang, T.S.: Visual interpretation of hand gestures for human-computer interaction: a review. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 677–695 (1997)

    Google Scholar 

  32. Quattoni, A., Wang, S., Morency, L.P., Collins, M., Darrell, T.: Hidden conditional random fields. IEEE Trans. Pattern Anal. Mach. Intell. 29(10), 1848–1852 (2007)

    Article  Google Scholar 

  33. Ren, Y., Gu, C.: Hand gesture recognition based on hog characters and SVM. Bull. Sci. Technol. 2, 011 (2011)

    Google Scholar 

  34. Shotton, J., et al.: Real-time human pose recognition in parts from single depth images. In: CVPR 2011, pp. 1297–1304. IEEE (2011)

    Google Scholar 

  35. Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1145–1153 (2017)

    Google Scholar 

  36. Starner, T., Weaver, J., Pentland, A.: Real-time American sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1371–1375 (1998)

    Google Scholar 

  37. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (NIPS), pp. 5998–6008 (2017)

    Google Scholar 

  38. Yang, H.D., Park, A.Y., Lee, S.W.: Robust spotting of key gestures from whole body motion sequence. In: 7th International Conference on Automatic Face and Gesture Recognition (FGR06), pp. 231–236. IEEE (2006)

    Google Scholar 

  39. Yuan, S., et al.: Depth-based 3D hand pose estimation: From current achievements to future goals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2636–2645 (2018)

    Google Scholar 

  40. Zhang, L., Zhu, G., Mei, L., Shen, P., Shah, S.A.A., Bennamoun, M.: Attention in convolutional LSTM for gesture recognition. In: Advances in Neural Information Processing Systems, pp. 1953–1962 (2018)

    Google Scholar 

  41. Zhang, W., Lin, Z., Cheng, J., Ma, C., Deng, X., Wang, H.: STA-GCN: two-stream graph convolutional network with spatial–temporal attention for hand gesture recognition. Vis. Comput. 36(10), 2433–2444 (2020). https://doi.org/10.1007/s00371-020-01955-w

  42. Zhu, G., Zhang, L., Shen, P., Song, J.: Multimodal gesture recognition using 3-D convolution and convolutional LSTM. IEEE Access 5, 4517–4524 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessandro Simoni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

D’Eusanio, A., Pini, S., Borghi, G., Simoni, A., Vezzani, R. (2022). Unsupervised Detection of Dynamic Hand Gestures from Leap Motion Data. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13231. Springer, Cham. https://doi.org/10.1007/978-3-031-06427-2_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06427-2_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06426-5

  • Online ISBN: 978-3-031-06427-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics