Abstract
The effective and reliable detection and classification of dynamic hand gestures is a key element for building Natural User Interfaces, systems that allow the users to interact using free movements of their body instead of traditional mechanical tools. However, methods that temporally segment and classify dynamic gestures usually rely on a great amount of labeled data, including annotations regarding the class and the temporal segmentation of each gesture. In this paper, we propose an unsupervised approach to train a Transformer-based architecture that learns to detect dynamic hand gestures in a continuous temporal sequence. The input data is represented by the 3D position of the hand joints, along with their speed and acceleration, collected through a Leap Motion device. Experimental results show a promising accuracy on both the detection and the classification task and that only limited computational power is required, confirming that the proposed method can be applied in real-world applications.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bala, A., Kumar, A., Birla, N.: Voice command recognition system based on MFCC and DTW. Int. J. Eng. Sci. Technol. 2(12), 7335–7342 (2010)
Borghi, G., Vezzani, R., Cucchiara, R.: Fast gesture recognition with multiple stream discrete HMMs on 3D skeletons. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 997–1002. IEEE (2016)
Boukhayma, A., Bem, R.d., Torr, P.H.: 3D hand shape and pose from images in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10843–10852 (2019)
Brenon, A., Portet, F., Vacher, M.: Preliminary study of adaptive decision-making system for vocal command in smart home. In: 2016 12th International Conference on Intelligent Environments (IE), pp. 218–221. IEEE (2016)
Caputo, F.M., et al.: Online gesture recognition. In: Eurographics Workshop on 3D Object Retrieval. The Eurographics Association (2019)
Chen, F.S., Fu, C.M., Huang, C.L.: Hand gesture recognition using a real-time tracking method and hidden Markov models. Image Vis. Comput. 21(8), 745–758 (2003)
Cheng, W., Sun, Y., Li, G., Jiang, G., Liu, H.: Jointly network: a network based on CNN and RBM for gesture recognition. Neural Comput. Appl. 31(1), 309–323 (2019)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Darrell, T.J., Essa, I.A., Pentland, A.P.: Task-specific gesture analysis in real-time using interpolated views. IEEE Trans. Pattern Anal. Mach. Intell. 18(12), 1236–1242 (1996)
D’Eusanio, A., Simoni, A., Pini, S., Borghi, G., Vezzani, R., Cucchiara, R.: A transformer-based network for dynamic hand gesture recognition. In: International Conference on 3D Vision (2020)
D’Eusanio, A., Simoni, A., Pini, S., Borghi, G., Vezzani, R., Cucchiara, R.: Multimodal hand gesture classification for the human-car interaction. Informatics 7, 31 (2020). Multidisciplinary Digital Publishing Institute
Feng, K.p., Yuan, F.: Static hand gesture recognition based on hog characters and support vector machines. In: 2013 2nd International Symposium on Instrumentation and Measurement, Sensor Network and Automation (IMSNA), pp. 936–938. IEEE (2013)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hu, Y., Wong, Y., Wei, W., Du, Y., Kankanhalli, M., Geng, W.: A novel attention-based hybrid CNN-RNN architecture for sEMG-based gesture recognition. PLoS ONE 13(10), e0206049 (2018)
Kahol, K., Tripathi, P., Panchanathan, S.: Automated gesture segmentation from dance sequences. In: Sixth IEEE International Conference on Automatic Face and Gesture Recognition 2004, Proceedings, pp. 883–888. IEEE (2004)
Kang, H., Lee, C.W., Jung, K.: Recognition-based gesture spotting in video games. Pattern Recogn. Lett. 25(15), 1701–1714 (2004)
Ki, J., Kwon, Y.M.: 3D gaze estimation and interaction. In: 2008 3DTV Conference: The True Vision-Capture, Transmission and Display of 3D Video, pp. 373–376. IEEE (2008)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lander, C., Gehring, S., Krüger, A., Boring, S., Bulling, A.: GazeProjector: accurate gaze estimation and seamless gaze interaction across multiple displays. In: Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, pp. 395–404 (2015)
Leap Motion. https://www.ultraleap.com/product/leap-motion-controller. Accessed 09 Dec 2021
Lee, H.K., Kim, J.H.: An HMM-based threshold model approach for gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 21(10), 961–973, e0206049 (1999)
Lefebvre, G., Berlemont, S., Mamalet, F., Garcia, C.: BLSTM-RNN based 3D gesture classification. In: Mladenov, V., Koprinkova-Hristova, P., Palm, G., Villa, A.E.P., Appollini, B., Kasabov, N. (eds.) ICANN 2013. LNCS, vol. 8131, pp. 381–388. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40728-4_48
Li, R., Liu, Z., Tan, J.: A survey on 3D hand pose estimation: cameras, methods, and datasets. Pattern Recogn. 93, 251–272 (2019)
Liu, J., Liu, Y., Wang, Y., Prinet, V., Xiang, S., Pan, C.: Decoupled representation learning for skeleton-based gesture recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5751–5760 (2020)
Liu, W.: Natural user interface-next mainstream product user interface. In: 2010 IEEE 11th International Conference on Computer-Aided Industrial Design & Conceptual Design, vol. 1, pp. 203–205. IEEE (2010)
Manganaro, F., Pini, S., Borghi, G., Vezzani, R., Cucchiara, R.: Hand gestures for the human-car interaction: the Briareo dataset. In: Ricci, E., Rota Bulò, S., Snoek, C., Lanz, O., Messelodi, S., Sebe, N. (eds.) ICIAP 2019. LNCS, vol. 11752, pp. 560–571. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30645-8_51
Oka, R.: Spotting method for classification of real world data. Comput. J. 41(8), 559–565 (1998)
Park, G., Kim, T.K., Woo, W.: 3D hand pose estimation with a single infrared camera via domain transfer learning. In: 2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 588–599. IEEE (2020)
Paszke, A., et al.: Automatic differentiation in PyTorch. In: Neural Information Processing Systems Workshops (2017)
Pavlovic, V.I., Sharma, R., Huang, T.S.: Visual interpretation of hand gestures for human-computer interaction: a review. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 677–695 (1997)
Quattoni, A., Wang, S., Morency, L.P., Collins, M., Darrell, T.: Hidden conditional random fields. IEEE Trans. Pattern Anal. Mach. Intell. 29(10), 1848–1852 (2007)
Ren, Y., Gu, C.: Hand gesture recognition based on hog characters and SVM. Bull. Sci. Technol. 2, 011 (2011)
Shotton, J., et al.: Real-time human pose recognition in parts from single depth images. In: CVPR 2011, pp. 1297–1304. IEEE (2011)
Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1145–1153 (2017)
Starner, T., Weaver, J., Pentland, A.: Real-time American sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1371–1375 (1998)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (NIPS), pp. 5998–6008 (2017)
Yang, H.D., Park, A.Y., Lee, S.W.: Robust spotting of key gestures from whole body motion sequence. In: 7th International Conference on Automatic Face and Gesture Recognition (FGR06), pp. 231–236. IEEE (2006)
Yuan, S., et al.: Depth-based 3D hand pose estimation: From current achievements to future goals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2636–2645 (2018)
Zhang, L., Zhu, G., Mei, L., Shen, P., Shah, S.A.A., Bennamoun, M.: Attention in convolutional LSTM for gesture recognition. In: Advances in Neural Information Processing Systems, pp. 1953–1962 (2018)
Zhang, W., Lin, Z., Cheng, J., Ma, C., Deng, X., Wang, H.: STA-GCN: two-stream graph convolutional network with spatial–temporal attention for hand gesture recognition. Vis. Comput. 36(10), 2433–2444 (2020). https://doi.org/10.1007/s00371-020-01955-w
Zhu, G., Zhang, L., Shen, P., Song, J.: Multimodal gesture recognition using 3-D convolution and convolutional LSTM. IEEE Access 5, 4517–4524 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
D’Eusanio, A., Pini, S., Borghi, G., Simoni, A., Vezzani, R. (2022). Unsupervised Detection of Dynamic Hand Gestures from Leap Motion Data. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13231. Springer, Cham. https://doi.org/10.1007/978-3-031-06427-2_35
Download citation
DOI: https://doi.org/10.1007/978-3-031-06427-2_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06426-5
Online ISBN: 978-3-031-06427-2
eBook Packages: Computer ScienceComputer Science (R0)