Skip to main content

Gesture Recognition and Multi-modal Fusion on a New Hand Gesture Dataset

  • Conference paper
  • First Online:
Pattern Recognition Applications and Methods (ICPRAM 2021, ICPRAM 2022)

Abstract

We present a baseline for gesture recognition using state-of-the-art sequence classifiers on a new freely available multi-modal dataset of free-hand gestures. The dataset consists of roughly 100,000 samples, grouped into six classes of typical and easy-to-learn hand gestures. The dataset was recorded using two independent sensors, allowing for experiments on multi-modal data fusion at several depth levels and allowing research on multi-modal fusion for early, intermediate, and late fusion techniques. Since the whole dataset was recorded by a single person we ensure a very high quality of data with little to no risk for incorrectly performed gestures. We show the results of our experiments on unimodal sequence classification using a LSTM as well as a CNN classifier. We also show that multi-modal fusion of all four modalities results in higher precision using late-fusion of the output layer of an LSTM classifier trained on a single modality. Finally, we demonstrate that it is possible to perform live gesture classification using an LSTM-based gesture classifier, showing that generalization to other persons performing the gestures is high.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Angelaki, D.E., Gu, Y., DeAngelis, G.C.: Multisensory integration: psychophysics, neurophysiology, and computation. Curr. Opin. Neurobiol. 19(4), 452–458 (2009)

    Article  Google Scholar 

  2. Beauchamp, M.S.: See me, hear me, touch me: multisensory integration in lateral occipital-temporal cortex. Curr. Opin. Neurobiol. 15(2), 145–153 (2005)

    Article  Google Scholar 

  3. Becker, S., Ackermann, M., Lapuschkin, S., Müller, K.R., Samek, W.: Interpreting and explaining deep neural networks for classification of audio signals (2018)

    Google Scholar 

  4. Chen, C., Jafari, R., Kehtarnavaz, N.: Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Trans. Hum.-Mach. Syst. 45, 51–61 (2014). https://doi.org/10.1109/THMS.2014.2362520

    Article  Google Scholar 

  5. Ernst, M.O., Banks, M.S.: Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415(6870), 429–433 (2002)

    Article  Google Scholar 

  6. Escalera, S., et al.: Chalearn multi-modal gesture recognition 2013: grand challenge and workshop summary. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 365–368 (2013)

    Google Scholar 

  7. Gepperth, A.R., Hecht, T., Gogate, M.: A generative learning approach to sensor fusion and change detection. Cogn. Comput. 8(5), 806–817 (2016)

    Article  Google Scholar 

  8. Guan, Y., Zheng, M.: Real-time 3D pointing gesture recognition for natural HCI. In: 2008 7th World Congress on Intelligent Control and Automation, pp. 2433–2436 (2008). https://doi.org/10.1109/WCICA.2008.4593304

  9. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  10. Imran, J., Raman, B.: Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition. J. Ambient. Intell. Humaniz. Comput. 11(1), 189–208 (2019). https://doi.org/10.1007/s12652-019-01239-9

    Article  Google Scholar 

  11. Khaire, P., Kumar, P., Imran, J.: Combining CNN streams of RGB-D and skeletal data for human activity recognition. Pattern Recognit. Lett. 115, 107–116 (2018)

    Article  Google Scholar 

  12. Kim, T.K., Cipolla, R.: Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Trans. Pattern Anal. Mach. Intell. 31(8), 1415–1428 (2008)

    Google Scholar 

  13. Kopuklu, O., Rong, Y., Rigoll, G.: Talking with your hands: scaling hand gestures and recognition with CNNs. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  14. Liu, K., Chen, C., Jafari, R., Kehtarnavaz, N.: Fusion of inertial and depth sensor data for robust hand gesture recognition. IEEE Sens. J. 14(6), 1898–1903 (2014). https://doi.org/10.1109/JSEN.2014.2306094

    Article  Google Scholar 

  15. Liu, L., Shao, L.: Learning discriminative representations from RGB-D video data. In: Twenty-Third International Joint Conference on Artificial Intelligence (2013)

    Google Scholar 

  16. Marin, G., Dominio, F., Zanuttigh, P.: Hand gesture recognition with jointly calibrated leap motion and depth sensor. Multimed. Tools Appl. 75(22), 14991–15015 (2015). https://doi.org/10.1007/s11042-015-2451-6

    Article  Google Scholar 

  17. McConnell, R.: Method of and apparatus for pattern recognition (1986)

    Google Scholar 

  18. Memo, A., Minto, L., Zanuttigh, P.: Exploiting silhouette descriptors and synthetic data for hand gesture recognition. In: Giachetti, A., Biasotti, S., Tarini, M. (eds.) Smart Tools and Apps for Graphics - Eurographics Italian Chapter Conference. The Eurographics Association (2015). https://doi.org/10.2312/stag.20151288

  19. Memo, A., Zanuttigh, P.: Head-mounted gesture controlled interface for human-computer interaction. Multimed. Tools Appl. 77(1), 27–53 (2016). https://doi.org/10.1007/s11042-016-4223-3

    Article  Google Scholar 

  20. Nasser, K.: Digital Signal Processing System Design: LabVIEW Based Hybrid Programming (2008)

    Google Scholar 

  21. Rusu, R.B., Blodow, N., Marton, Z.C., Beetz, M.: Aligning point cloud views using persistent feature histograms. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3384–3391. IEEE (2008)

    Google Scholar 

  22. Sachara, F., Kopinski, T., Gepperth, A., Handmann, U.: Free-hand gesture recognition with 3D-CNNs for in-car infotainment control in real-time. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), pp. 959–964 (2017). https://doi.org/10.1109/ITSC.2017.8317684

  23. Sarkar, A., Gepperth, A., Handmann, U., Kopinski, T.: Dynamic hand gesture recognition for mobile systems using deep LSTM. In: Horain, P., Achard, C., Mallem, M. (eds.) IHCI 2017. LNCS, vol. 10688, pp. 19–31. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72038-8_3

    Chapter  Google Scholar 

  24. Schak, M., Gepperth, A.: On multi-modal fusion for freehand gesture recognition. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12396, pp. 862–873. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61609-0_68

    Chapter  Google Scholar 

  25. Schak, M., Gepperth, A.: Gesture recognition on a new multi-modal hand gesture dataset. In: ICPRAM (2022)

    Google Scholar 

  26. Schak, M., Gepperth, A.: Robustness of deep LSTM networks in freehand gesture recognition. In: Tetko, I.V., Kůrková, V., Karpov, P., Theis, F. (eds.) ICANN 2019. LNCS, vol. 11729, pp. 330–343. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30508-6_27

    Chapter  Google Scholar 

  27. Tran, T., et al.: A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1947–1952 (2018). https://doi.org/10.1109/ICPR.2018.8546308

  28. Tran, T.H., et al.: A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1947–1952 (2018). https://doi.org/10.1109/ICPR.2018.8546308

  29. Wan, J., Li, S.Z., Zhao, Y., Zhou, S., Guyon, I., Escalera, S.: ChaLearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 761–769 (2016). https://doi.org/10.1109/CVPRW.2016.100

  30. Wan, J., Zhao, Y., Zhou, S., Guyon, I., Escalera, S., Li, S.Z.: ChaLearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 56–64 (2016)

    Google Scholar 

  31. William T. Freeman, M.R.: Orientation histograms for hand gesture recognition. Technical report TR94-03, MERL - Mitsubishi Electric Research Laboratories, Cambridge, MA 02139 (1994)

    Google Scholar 

  32. Zhang, Y., Cao, C., Cheng, J., Lu, H.: EgoGesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans. Multimed. 20(5), 1038–1050 (2018). https://doi.org/10.1109/TMM.2018.2808769

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Monika Schak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schak, M., Gepperth, A. (2023). Gesture Recognition and Multi-modal Fusion on a New Hand Gesture Dataset. In: De Marsico, M., Sanniti di Baja, G., Fred, A. (eds) Pattern Recognition Applications and Methods. ICPRAM ICPRAM 2021 2022. Lecture Notes in Computer Science, vol 13822. Springer, Cham. https://doi.org/10.1007/978-3-031-24538-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24538-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24537-4

  • Online ISBN: 978-3-031-24538-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics