Hand Gesture Recognition Using a Multi-modal Deep Neural Network

Fulsunder, Saneet; Umar, Saidu; Taherkhani, Aboozar; Liu, Chang; Yang, Shengxiang

doi:10.1007/978-3-031-57919-6_14

Saneet Fulsunder¹⁸,
Saidu Umar¹⁸,
Aboozar Taherkhani¹⁸,
Chang Liu¹⁹ &
…
Shengxiang Yang¹⁸

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 704))

Included in the following conference series:

International Conference on Intelligent Information Processing

30 Accesses

Abstract

As devices around us get more intelligent, new ways of interacting with them are sought to improve user convenience and comfort. While gesture-controlled systems have existed for some time, they either use additional specialized imaging equipment, require unreasonable computing resources, or are simply not accurate enough to be a viable alternative. In this work, a reliable method of recognizing gestures is proposed. The built model correctly classifies hand gestures for keyboard typing based on the activity captured by an ordinary camera. Two models are initially developed for classifying video data and classifying time-series sequences of the skeleton data extracted from a video. The models use different strategies of classification and are built using lightweight architectures. The two models are the baseline models which are integrated to form a single multi-modal model with multiple inputs, i.e., video and time-series inputs, to improve accuracy. The performances of the baseline models are then compared to the multimodal classifier. Since the multimodal classifier is based on the initial models, it naturally inherits the benefits of both baseline architectures and provides a higher testing accuracy of 100% compared to the accuracy of 85% and 75% for the baseline models respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Alani, A.A., Cosma, G., Taherkhani, A., McGinnity, T.M.: Hand gesture recognition using an adapted convolutional neural network with data augmentation. In: 4th International Conference on Information Management, ICIM 2018, pp. 5–12 (2018)
Google Scholar
Alani, A.A., Cosma, G., Taherkhani, A.: Classifying imbalanced multi-modal sensor data for human activity recognition in a smart home using deep learning. In: Proceedings of the International Joint Conference on Neural Networks, July 2020
Google Scholar
Taherkhani, A., Cosma, G., Alani, A.A., McGinnity, T.M.: Activity recognition from multi-modal sensor data using a deep convolutional neural network. Adv. Intell. Syst. Comput. 857, 203–218 (2019)
Article Google Scholar
Umakanthan, S., Denman, S., Sridharan, S., Fookes, C., Wark, T.: Spatio temporal feature evaluation for action recognition. In: International Conference on Digital Image Computing Techniques and Applications, DICTA 2012, pp. 1–8 (2012)
Google Scholar
Kabir, R., Ahmed, N., Roy, N., Islam, M.R.: A novel dynamic hand gesture and movement trajectory recognition model for non-touch HRI interface. In: 2019 IEEE Eurasia Conf. on IOT, Communication and Engineering, pp. 505–508 (2019)
Google Scholar
Wan, X., Xing, T., Ji, Y., Gong, S., Liu, C.: 3D human action recognition with skeleton orientation vectors and stacked residual bi-LSTM. In: Proceedings - 4th Asian Conference on Pattern Recognition, ACPR 2017, pp. 577–582 (2017)
Google Scholar
Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp. 4645–4653 (2017)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2015 Inter, pp. 4489–4497 (2015)
Google Scholar
Jiang, X., Xu, K., Sun, T.: Action recognition scheme based on skeleton representation with DS-LSTM network. IEEE Trans. Circuits Syst. Video Technol. 30(7), 2129–2140 (2020)
Article Google Scholar
Pan, H., Chen, Y.: Multilevel LSTM for action recognition based on skeleton sequence. In: Proceedings - 21st IEEE International Conference on High Performance Computing and Communications, 17th IEEE International Conference on Smart City and 5th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2019, pp. 2218–2223 (2019)
Google Scholar
Baltrusaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2019)
Article Google Scholar
Huo, D., Chen, Y., Li, F., Lei, Z.: Modality-convolutions: Multi-modal gesture recognition based on convolutional neural network. In: ICCSE 2017 - 12th International Conference on Computer Science and Education, ICCSE, pp. 349–353 (2017)
Google Scholar
Suni, S.S., Gopakumar, K.: Fusing multimodal features for recognizing hand gestures. In: 2nd International Conference on Advanced Computational and Communication Paradigms, ICACCP 2019 (2019)
Google Scholar
Liao, C.J., Su, S.F., Chen, M.C.: Vision-based hand gesture recognition system for a dynamic and complicated environment. In: Proceedings - 2015 IEEE International Conference on Systems, pp. 2891–2895 (2015)
Google Scholar
Chen, J., Zheng, M.: A review of research on robot operation behavior based on deep reinforcement learning. Robot 44(02) (2022)
Google Scholar

Download references

Acknowledgments

This work was supported in part by Liaoning Province Applied Basic Research Program: Human-machine Fusion Intelligent Modeling and Collaborative Optimization Driven by Data and Knowledge under Grant 2023JH2/101300184.

Author information

Authors and Affiliations

School of Computer Science and Informatics, De Montfort University, Leicester, UK
Saneet Fulsunder, Saidu Umar, Aboozar Taherkhani & Shengxiang Yang
Key Laboratory of Networked Control Systems and Digital Factory Department, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, 110016, China
Chang Liu

Authors

Saneet Fulsunder
View author publications
You can also search for this author in PubMed Google Scholar
Saidu Umar
View author publications
You can also search for this author in PubMed Google Scholar
Aboozar Taherkhani
View author publications
You can also search for this author in PubMed Google Scholar
Chang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shengxiang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aboozar Taherkhani .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Zhongzhi Shi
University of Oslo, Oslo, Norway
Jim Torresen
De Montfort University, Leicester, UK
Shengxiang Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fulsunder, S., Umar, S., Taherkhani, A., Liu, C., Yang, S. (2024). Hand Gesture Recognition Using a Multi-modal Deep Neural Network. In: Shi, Z., Torresen, J., Yang, S. (eds) Intelligent Information Processing XII. IIP 2024. IFIP Advances in Information and Communication Technology, vol 704. Springer, Cham. https://doi.org/10.1007/978-3-031-57919-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-57919-6_14
Published: 06 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57918-9
Online ISBN: 978-3-031-57919-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)