Skip to main content

Hand Gesture Recognition Using a Multi-modal Deep Neural Network

  • Conference paper
  • First Online:
Intelligent Information Processing XII (IIP 2024)

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 704))

Included in the following conference series:

  • 30 Accesses

Abstract

As devices around us get more intelligent, new ways of interacting with them are sought to improve user convenience and comfort. While gesture-controlled systems have existed for some time, they either use additional specialized imaging equipment, require unreasonable computing resources, or are simply not accurate enough to be a viable alternative. In this work, a reliable method of recognizing gestures is proposed. The built model correctly classifies hand gestures for keyboard typing based on the activity captured by an ordinary camera. Two models are initially developed for classifying video data and classifying time-series sequences of the skeleton data extracted from a video. The models use different strategies of classification and are built using lightweight architectures. The two models are the baseline models which are integrated to form a single multi-modal model with multiple inputs, i.e., video and time-series inputs, to improve accuracy. The performances of the baseline models are then compared to the multimodal classifier. Since the multimodal classifier is based on the initial models, it naturally inherits the benefits of both baseline architectures and provides a higher testing accuracy of 100% compared to the accuracy of 85% and 75% for the baseline models respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Alani, A.A., Cosma, G., Taherkhani, A., McGinnity, T.M.: Hand gesture recognition using an adapted convolutional neural network with data augmentation. In: 4th International Conference on Information Management, ICIM 2018, pp. 5–12 (2018)

    Google Scholar 

  2. Alani, A.A., Cosma, G., Taherkhani, A.: Classifying imbalanced multi-modal sensor data for human activity recognition in a smart home using deep learning. In: Proceedings of the International Joint Conference on Neural Networks, July 2020

    Google Scholar 

  3. Taherkhani, A., Cosma, G., Alani, A.A., McGinnity, T.M.: Activity recognition from multi-modal sensor data using a deep convolutional neural network. Adv. Intell. Syst. Comput. 857, 203–218 (2019)

    Article  Google Scholar 

  4. Umakanthan, S., Denman, S., Sridharan, S., Fookes, C., Wark, T.: Spatio temporal feature evaluation for action recognition. In: International Conference on Digital Image Computing Techniques and Applications, DICTA 2012, pp. 1–8 (2012)

    Google Scholar 

  5. Kabir, R., Ahmed, N., Roy, N., Islam, M.R.: A novel dynamic hand gesture and movement trajectory recognition model for non-touch HRI interface. In: 2019 IEEE Eurasia Conf. on IOT, Communication and Engineering, pp. 505–508 (2019)

    Google Scholar 

  6. Wan, X., Xing, T., Ji, Y., Gong, S., Liu, C.: 3D human action recognition with skeleton orientation vectors and stacked residual bi-LSTM. In: Proceedings - 4th Asian Conference on Pattern Recognition, ACPR 2017, pp. 577–582 (2017)

    Google Scholar 

  7. Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp. 4645–4653 (2017)

    Google Scholar 

  8. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2015 Inter, pp. 4489–4497 (2015)

    Google Scholar 

  9. Jiang, X., Xu, K., Sun, T.: Action recognition scheme based on skeleton representation with DS-LSTM network. IEEE Trans. Circuits Syst. Video Technol. 30(7), 2129–2140 (2020)

    Article  Google Scholar 

  10. Pan, H., Chen, Y.: Multilevel LSTM for action recognition based on skeleton sequence. In: Proceedings - 21st IEEE International Conference on High Performance Computing and Communications, 17th IEEE International Conference on Smart City and 5th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2019, pp. 2218–2223 (2019)

    Google Scholar 

  11. Baltrusaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2019)

    Article  Google Scholar 

  12. Huo, D., Chen, Y., Li, F., Lei, Z.: Modality-convolutions: Multi-modal gesture recognition based on convolutional neural network. In: ICCSE 2017 - 12th International Conference on Computer Science and Education, ICCSE, pp. 349–353 (2017)

    Google Scholar 

  13. Suni, S.S., Gopakumar, K.: Fusing multimodal features for recognizing hand gestures. In: 2nd International Conference on Advanced Computational and Communication Paradigms, ICACCP 2019 (2019)

    Google Scholar 

  14. Liao, C.J., Su, S.F., Chen, M.C.: Vision-based hand gesture recognition system for a dynamic and complicated environment. In: Proceedings - 2015 IEEE International Conference on Systems, pp. 2891–2895 (2015)

    Google Scholar 

  15. Chen, J., Zheng, M.: A review of research on robot operation behavior based on deep reinforcement learning. Robot 44(02) (2022)

    Google Scholar 

Download references

Acknowledgments

This work was supported in part by Liaoning Province Applied Basic Research Program: Human-machine Fusion Intelligent Modeling and Collaborative Optimization Driven by Data and Knowledge under Grant 2023JH2/101300184.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aboozar Taherkhani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fulsunder, S., Umar, S., Taherkhani, A., Liu, C., Yang, S. (2024). Hand Gesture Recognition Using a Multi-modal Deep Neural Network. In: Shi, Z., Torresen, J., Yang, S. (eds) Intelligent Information Processing XII. IIP 2024. IFIP Advances in Information and Communication Technology, vol 704. Springer, Cham. https://doi.org/10.1007/978-3-031-57919-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-57919-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-57918-9

  • Online ISBN: 978-3-031-57919-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics