Gesture Recognition and Multi-modal Fusion on a New Hand Gesture Dataset

Schak, Monika; Gepperth, Alexander

doi:10.1007/978-3-031-24538-1_4

Monika Schak¹⁰ &
Alexander Gepperth¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13822))

Included in the following conference series:

225 Accesses
1 Citations

Abstract

We present a baseline for gesture recognition using state-of-the-art sequence classifiers on a new freely available multi-modal dataset of free-hand gestures. The dataset consists of roughly 100,000 samples, grouped into six classes of typical and easy-to-learn hand gestures. The dataset was recorded using two independent sensors, allowing for experiments on multi-modal data fusion at several depth levels and allowing research on multi-modal fusion for early, intermediate, and late fusion techniques. Since the whole dataset was recorded by a single person we ensure a very high quality of data with little to no risk for incorrectly performed gestures. We show the results of our experiments on unimodal sequence classification using a LSTM as well as a CNN classifier. We also show that multi-modal fusion of all four modalities results in higher precision using late-fusion of the output layer of an LSTM classifier trained on a single modality. Finally, we demonstrate that it is possible to perform live gesture classification using an LSTM-based gesture classifier, showing that generalization to other persons performing the gestures is high.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Angelaki, D.E., Gu, Y., DeAngelis, G.C.: Multisensory integration: psychophysics, neurophysiology, and computation. Curr. Opin. Neurobiol. 19(4), 452–458 (2009)
Article Google Scholar
Beauchamp, M.S.: See me, hear me, touch me: multisensory integration in lateral occipital-temporal cortex. Curr. Opin. Neurobiol. 15(2), 145–153 (2005)
Article Google Scholar
Becker, S., Ackermann, M., Lapuschkin, S., Müller, K.R., Samek, W.: Interpreting and explaining deep neural networks for classification of audio signals (2018)
Google Scholar
Chen, C., Jafari, R., Kehtarnavaz, N.: Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Trans. Hum.-Mach. Syst. 45, 51–61 (2014). https://doi.org/10.1109/THMS.2014.2362520
Article Google Scholar
Ernst, M.O., Banks, M.S.: Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415(6870), 429–433 (2002)
Article Google Scholar
Escalera, S., et al.: Chalearn multi-modal gesture recognition 2013: grand challenge and workshop summary. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 365–368 (2013)
Google Scholar
Gepperth, A.R., Hecht, T., Gogate, M.: A generative learning approach to sensor fusion and change detection. Cogn. Comput. 8(5), 806–817 (2016)
Article Google Scholar
Guan, Y., Zheng, M.: Real-time 3D pointing gesture recognition for natural HCI. In: 2008 7th World Congress on Intelligent Control and Automation, pp. 2433–2436 (2008). https://doi.org/10.1109/WCICA.2008.4593304
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Imran, J., Raman, B.: Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition. J. Ambient. Intell. Humaniz. Comput. 11(1), 189–208 (2019). https://doi.org/10.1007/s12652-019-01239-9
Article Google Scholar
Khaire, P., Kumar, P., Imran, J.: Combining CNN streams of RGB-D and skeletal data for human activity recognition. Pattern Recognit. Lett. 115, 107–116 (2018)
Article Google Scholar
Kim, T.K., Cipolla, R.: Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Trans. Pattern Anal. Mach. Intell. 31(8), 1415–1428 (2008)
Google Scholar
Kopuklu, O., Rong, Y., Rigoll, G.: Talking with your hands: scaling hand gestures and recognition with CNNs. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)
Google Scholar
Liu, K., Chen, C., Jafari, R., Kehtarnavaz, N.: Fusion of inertial and depth sensor data for robust hand gesture recognition. IEEE Sens. J. 14(6), 1898–1903 (2014). https://doi.org/10.1109/JSEN.2014.2306094
Article Google Scholar
Liu, L., Shao, L.: Learning discriminative representations from RGB-D video data. In: Twenty-Third International Joint Conference on Artificial Intelligence (2013)
Google Scholar
Marin, G., Dominio, F., Zanuttigh, P.: Hand gesture recognition with jointly calibrated leap motion and depth sensor. Multimed. Tools Appl. 75(22), 14991–15015 (2015). https://doi.org/10.1007/s11042-015-2451-6
Article Google Scholar
McConnell, R.: Method of and apparatus for pattern recognition (1986)
Google Scholar
Memo, A., Minto, L., Zanuttigh, P.: Exploiting silhouette descriptors and synthetic data for hand gesture recognition. In: Giachetti, A., Biasotti, S., Tarini, M. (eds.) Smart Tools and Apps for Graphics - Eurographics Italian Chapter Conference. The Eurographics Association (2015). https://doi.org/10.2312/stag.20151288
Memo, A., Zanuttigh, P.: Head-mounted gesture controlled interface for human-computer interaction. Multimed. Tools Appl. 77(1), 27–53 (2016). https://doi.org/10.1007/s11042-016-4223-3
Article Google Scholar
Nasser, K.: Digital Signal Processing System Design: LabVIEW Based Hybrid Programming (2008)
Google Scholar
Rusu, R.B., Blodow, N., Marton, Z.C., Beetz, M.: Aligning point cloud views using persistent feature histograms. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3384–3391. IEEE (2008)
Google Scholar
Sachara, F., Kopinski, T., Gepperth, A., Handmann, U.: Free-hand gesture recognition with 3D-CNNs for in-car infotainment control in real-time. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), pp. 959–964 (2017). https://doi.org/10.1109/ITSC.2017.8317684
Sarkar, A., Gepperth, A., Handmann, U., Kopinski, T.: Dynamic hand gesture recognition for mobile systems using deep LSTM. In: Horain, P., Achard, C., Mallem, M. (eds.) IHCI 2017. LNCS, vol. 10688, pp. 19–31. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72038-8_3
Chapter Google Scholar
Schak, M., Gepperth, A.: On multi-modal fusion for freehand gesture recognition. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12396, pp. 862–873. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61609-0_68
Chapter Google Scholar
Schak, M., Gepperth, A.: Gesture recognition on a new multi-modal hand gesture dataset. In: ICPRAM (2022)
Google Scholar
Schak, M., Gepperth, A.: Robustness of deep LSTM networks in freehand gesture recognition. In: Tetko, I.V., Kůrková, V., Karpov, P., Theis, F. (eds.) ICANN 2019. LNCS, vol. 11729, pp. 330–343. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30508-6_27
Chapter Google Scholar
Tran, T., et al.: A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1947–1952 (2018). https://doi.org/10.1109/ICPR.2018.8546308
Tran, T.H., et al.: A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1947–1952 (2018). https://doi.org/10.1109/ICPR.2018.8546308
Wan, J., Li, S.Z., Zhao, Y., Zhou, S., Guyon, I., Escalera, S.: ChaLearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 761–769 (2016). https://doi.org/10.1109/CVPRW.2016.100
Wan, J., Zhao, Y., Zhou, S., Guyon, I., Escalera, S., Li, S.Z.: ChaLearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 56–64 (2016)
Google Scholar
William T. Freeman, M.R.: Orientation histograms for hand gesture recognition. Technical report TR94-03, MERL - Mitsubishi Electric Research Laboratories, Cambridge, MA 02139 (1994)
Google Scholar
Zhang, Y., Cao, C., Cheng, J., Lu, H.: EgoGesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans. Multimed. 20(5), 1038–1050 (2018). https://doi.org/10.1109/TMM.2018.2808769
Article Google Scholar

Download references

Author information

Authors and Affiliations

Fulda University of Applied Sciences, 36037, Fulda, Germany
Monika Schak & Alexander Gepperth

Authors

Monika Schak
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gepperth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Monika Schak .

Editor information

Editors and Affiliations

Sapienza Università di Roma, Rome, Italy
Maria De Marsico
ICAR, Consiglio Nazionale delle Ricerche, Naples, Napoli, Italy
Gabriella Sanniti di Baja
IST - Torre Norte, Instituto de Telecomunicações, Lisbon, Portugal
Ana Fred

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schak, M., Gepperth, A. (2023). Gesture Recognition and Multi-modal Fusion on a New Hand Gesture Dataset. In: De Marsico, M., Sanniti di Baja, G., Fred, A. (eds) Pattern Recognition Applications and Methods. ICPRAM ICPRAM 2021 2022. Lecture Notes in Computer Science, vol 13822. Springer, Cham. https://doi.org/10.1007/978-3-031-24538-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-24538-1_4
Published: 27 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24537-4
Online ISBN: 978-3-031-24538-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Gesture Recognition and Multi-modal Fusion on a New Hand Gesture Dataset