Skip to main content

Multimodal Human Computer Interaction Using Hand Gestures and Speech

  • Conference paper
  • First Online:
Intelligent Human Computer Interaction (IHCI 2022)

Abstract

The paper presents multimodal human-computer interaction using speech and gesture recognition to develop a system for mouse movement and operation. The approach allows users to perform mouse navigation and various mouse operations without the need for physical contact with the system. Splitting up the task of mouse navigation and operations with gesture and speech recognition respectively led to a user-friendly and seamless experience for the user. Since no physical contact is required between the user and the system, it could be used by doctors while performing surgery, mechanics while they are handling their instruments from a distance, and casual users if circumstance arise. Unlike a unimodal gesture recognition system the proposed multimodal system allows mouse pointer control using speech and employs gestures to perform mouse operations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://google.github.io/mediapipe/solutions/hands#python-solution-api/.

  2. 2.

    Michael Sanders, AutoPy Introduction and Tutorial, Available: https://pypi.org/project/autopy/.

  3. 3.

    NumPy, Available: https://numpy.org/.

  4. 4.

    Michael Sanders, AutoPy Introduction and Tutorial, Available: https://pypi.org/project/autopy/.

  5. 5.

    Google Cloud speech-to-text, Available: https://cloud.google.com/speech-to-text.

  6. 6.

    PyAudio, Available: https://pypi.org/project/PyAudio/.

  7. 7.

    Cloud speech-to-text Documentation, Available: https://cloud.google.com/speech-totext/docs/basics.

  8. 8.

    mouse_event function (winuser.h), Available: https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-mouse_event.

  9. 9.

    Keyboard module in Python, Available: https://www.geeksforgeeks.org/keyboard-module-in-python/.

References

  1. Ergüner, F., Durdu, P.O.: Multimodal natural interaction for 3d images. In: 9th International Conference on AICT, pp. 305–309. IEEE (2015)

    Google Scholar 

  2. Sahadat, M.N., Alreja, A., Mikail, N., Ghovanloo, M.: Comparing the use of single versus multiple combined abilities in conducting complex computer tasks hands-free. IEEE Trans. Neural Syst. Rehabil. Eng. 26(9), 1868–1877 (2018)

    Article  Google Scholar 

  3. Lakdawala, B., Khan, F., Khan, A., Tomar, Y., Gupta, R., Shaikh, A.: Voice to text transcription using CMU sphinx a mobile application for healthcare organization. In: 2nd ICICCT, pp. 749–753. IEEE (2018)

    Google Scholar 

  4. Turk, M.: Multimodal interaction: a review. Pattern Recogn. Lett. 36, 189–195 (2014)

    Article  Google Scholar 

  5. Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M., Rudnicky, A.I.:. Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In: IEEE ICASSP Proceedings, vol. 1, pp. I-185-I-188. IEEE (2006)

    Google Scholar 

  6. Serrano, M., Nigay, L., Lawson, J.Y.L., Ramsay, A., Murray-Smith, R., Denef, S.: The openinterface framework: a tool for multimodal interaction. In: CHI’08 Extended Abstracts on Human Factors in Computing Systems, pp. 3501–3506 (2008)

    Google Scholar 

  7. Dhankar, A.: Study of deep learning and CMU sphinx in automatic speech recognition. In: ICACCI 2017, pp. 2296–2301. IEEE (2017)

    Google Scholar 

  8. Grif, H.S., Farcas, C.C.: Mouse cursor control system based on hand gesture. Procedia Technol. 22, 657–661 (2016)

    Article  Google Scholar 

  9. Schaffer, S., Reithinger, N.: Benefit, design and evaluation of multimodal interaction. In: Proceedings of the 2016 DSLI Workshop. ACM CHI (2016)

    Google Scholar 

  10. Oviatt, S., Olsen, E.: Integration themes in multimodal human-computer interaction. In: 3rd ICSLP (1994)

    Google Scholar 

  11. Raisamo, R.: Multimodal Human-Computer Interaction: a constructive and empirical study. Tampere University Press (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kavitha Mahesh Karimbi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ridhun, M., Lewis, R.S., Misquith, S.C., Poojary, S., Mahesh Karimbi, K. (2023). Multimodal Human Computer Interaction Using Hand Gestures and Speech. In: Zaynidinov, H., Singh, M., Tiwary, U.S., Singh, D. (eds) Intelligent Human Computer Interaction. IHCI 2022. Lecture Notes in Computer Science, vol 13741. Springer, Cham. https://doi.org/10.1007/978-3-031-27199-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-27199-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-27198-4

  • Online ISBN: 978-3-031-27199-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics