Abstract
We present a baseline for gesture recognition using state-of-the-art sequence classifiers on a new freely available multi-modal dataset of free-hand gestures. The dataset consists of roughly 100,000 samples, grouped into six classes of typical and easy-to-learn hand gestures. The dataset was recorded using two independent sensors, allowing for experiments on multi-modal data fusion at several depth levels and allowing research on multi-modal fusion for early, intermediate, and late fusion techniques. Since the whole dataset was recorded by a single person we ensure a very high quality of data with little to no risk for incorrectly performed gestures. We show the results of our experiments on unimodal sequence classification using a LSTM as well as a CNN classifier. We also show that multi-modal fusion of all four modalities results in higher precision using late-fusion of the output layer of an LSTM classifier trained on a single modality. Finally, we demonstrate that it is possible to perform live gesture classification using an LSTM-based gesture classifier, showing that generalization to other persons performing the gestures is high.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Angelaki, D.E., Gu, Y., DeAngelis, G.C.: Multisensory integration: psychophysics, neurophysiology, and computation. Curr. Opin. Neurobiol. 19(4), 452–458 (2009)
Beauchamp, M.S.: See me, hear me, touch me: multisensory integration in lateral occipital-temporal cortex. Curr. Opin. Neurobiol. 15(2), 145–153 (2005)
Becker, S., Ackermann, M., Lapuschkin, S., Müller, K.R., Samek, W.: Interpreting and explaining deep neural networks for classification of audio signals (2018)
Chen, C., Jafari, R., Kehtarnavaz, N.: Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Trans. Hum.-Mach. Syst. 45, 51–61 (2014). https://doi.org/10.1109/THMS.2014.2362520
Ernst, M.O., Banks, M.S.: Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415(6870), 429–433 (2002)
Escalera, S., et al.: Chalearn multi-modal gesture recognition 2013: grand challenge and workshop summary. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 365–368 (2013)
Gepperth, A.R., Hecht, T., Gogate, M.: A generative learning approach to sensor fusion and change detection. Cogn. Comput. 8(5), 806–817 (2016)
Guan, Y., Zheng, M.: Real-time 3D pointing gesture recognition for natural HCI. In: 2008 7th World Congress on Intelligent Control and Automation, pp. 2433–2436 (2008). https://doi.org/10.1109/WCICA.2008.4593304
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Imran, J., Raman, B.: Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition. J. Ambient. Intell. Humaniz. Comput. 11(1), 189–208 (2019). https://doi.org/10.1007/s12652-019-01239-9
Khaire, P., Kumar, P., Imran, J.: Combining CNN streams of RGB-D and skeletal data for human activity recognition. Pattern Recognit. Lett. 115, 107–116 (2018)
Kim, T.K., Cipolla, R.: Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Trans. Pattern Anal. Mach. Intell. 31(8), 1415–1428 (2008)
Kopuklu, O., Rong, Y., Rigoll, G.: Talking with your hands: scaling hand gestures and recognition with CNNs. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)
Liu, K., Chen, C., Jafari, R., Kehtarnavaz, N.: Fusion of inertial and depth sensor data for robust hand gesture recognition. IEEE Sens. J. 14(6), 1898–1903 (2014). https://doi.org/10.1109/JSEN.2014.2306094
Liu, L., Shao, L.: Learning discriminative representations from RGB-D video data. In: Twenty-Third International Joint Conference on Artificial Intelligence (2013)
Marin, G., Dominio, F., Zanuttigh, P.: Hand gesture recognition with jointly calibrated leap motion and depth sensor. Multimed. Tools Appl. 75(22), 14991–15015 (2015). https://doi.org/10.1007/s11042-015-2451-6
McConnell, R.: Method of and apparatus for pattern recognition (1986)
Memo, A., Minto, L., Zanuttigh, P.: Exploiting silhouette descriptors and synthetic data for hand gesture recognition. In: Giachetti, A., Biasotti, S., Tarini, M. (eds.) Smart Tools and Apps for Graphics - Eurographics Italian Chapter Conference. The Eurographics Association (2015). https://doi.org/10.2312/stag.20151288
Memo, A., Zanuttigh, P.: Head-mounted gesture controlled interface for human-computer interaction. Multimed. Tools Appl. 77(1), 27–53 (2016). https://doi.org/10.1007/s11042-016-4223-3
Nasser, K.: Digital Signal Processing System Design: LabVIEW Based Hybrid Programming (2008)
Rusu, R.B., Blodow, N., Marton, Z.C., Beetz, M.: Aligning point cloud views using persistent feature histograms. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3384–3391. IEEE (2008)
Sachara, F., Kopinski, T., Gepperth, A., Handmann, U.: Free-hand gesture recognition with 3D-CNNs for in-car infotainment control in real-time. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), pp. 959–964 (2017). https://doi.org/10.1109/ITSC.2017.8317684
Sarkar, A., Gepperth, A., Handmann, U., Kopinski, T.: Dynamic hand gesture recognition for mobile systems using deep LSTM. In: Horain, P., Achard, C., Mallem, M. (eds.) IHCI 2017. LNCS, vol. 10688, pp. 19–31. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72038-8_3
Schak, M., Gepperth, A.: On multi-modal fusion for freehand gesture recognition. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12396, pp. 862–873. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61609-0_68
Schak, M., Gepperth, A.: Gesture recognition on a new multi-modal hand gesture dataset. In: ICPRAM (2022)
Schak, M., Gepperth, A.: Robustness of deep LSTM networks in freehand gesture recognition. In: Tetko, I.V., Kůrková, V., Karpov, P., Theis, F. (eds.) ICANN 2019. LNCS, vol. 11729, pp. 330–343. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30508-6_27
Tran, T., et al.: A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1947–1952 (2018). https://doi.org/10.1109/ICPR.2018.8546308
Tran, T.H., et al.: A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1947–1952 (2018). https://doi.org/10.1109/ICPR.2018.8546308
Wan, J., Li, S.Z., Zhao, Y., Zhou, S., Guyon, I., Escalera, S.: ChaLearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 761–769 (2016). https://doi.org/10.1109/CVPRW.2016.100
Wan, J., Zhao, Y., Zhou, S., Guyon, I., Escalera, S., Li, S.Z.: ChaLearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 56–64 (2016)
William T. Freeman, M.R.: Orientation histograms for hand gesture recognition. Technical report TR94-03, MERL - Mitsubishi Electric Research Laboratories, Cambridge, MA 02139 (1994)
Zhang, Y., Cao, C., Cheng, J., Lu, H.: EgoGesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans. Multimed. 20(5), 1038–1050 (2018). https://doi.org/10.1109/TMM.2018.2808769
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Schak, M., Gepperth, A. (2023). Gesture Recognition and Multi-modal Fusion on a New Hand Gesture Dataset. In: De Marsico, M., Sanniti di Baja, G., Fred, A. (eds) Pattern Recognition Applications and Methods. ICPRAM ICPRAM 2021 2022. Lecture Notes in Computer Science, vol 13822. Springer, Cham. https://doi.org/10.1007/978-3-031-24538-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-24538-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24537-4
Online ISBN: 978-3-031-24538-1
eBook Packages: Computer ScienceComputer Science (R0)