Abstract
The paper presents multimodal human-computer interaction using speech and gesture recognition to develop a system for mouse movement and operation. The approach allows users to perform mouse navigation and various mouse operations without the need for physical contact with the system. Splitting up the task of mouse navigation and operations with gesture and speech recognition respectively led to a user-friendly and seamless experience for the user. Since no physical contact is required between the user and the system, it could be used by doctors while performing surgery, mechanics while they are handling their instruments from a distance, and casual users if circumstance arise. Unlike a unimodal gesture recognition system the proposed multimodal system allows mouse pointer control using speech and employs gestures to perform mouse operations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Michael Sanders, AutoPy Introduction and Tutorial, Available: https://pypi.org/project/autopy/.
- 3.
NumPy, Available: https://numpy.org/.
- 4.
Michael Sanders, AutoPy Introduction and Tutorial, Available: https://pypi.org/project/autopy/.
- 5.
Google Cloud speech-to-text, Available: https://cloud.google.com/speech-to-text.
- 6.
PyAudio, Available: https://pypi.org/project/PyAudio/.
- 7.
Cloud speech-to-text Documentation, Available: https://cloud.google.com/speech-totext/docs/basics.
- 8.
mouse_event function (winuser.h), Available: https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-mouse_event.
- 9.
Keyboard module in Python, Available: https://www.geeksforgeeks.org/keyboard-module-in-python/.
References
Ergüner, F., Durdu, P.O.: Multimodal natural interaction for 3d images. In: 9th International Conference on AICT, pp. 305–309. IEEE (2015)
Sahadat, M.N., Alreja, A., Mikail, N., Ghovanloo, M.: Comparing the use of single versus multiple combined abilities in conducting complex computer tasks hands-free. IEEE Trans. Neural Syst. Rehabil. Eng. 26(9), 1868–1877 (2018)
Lakdawala, B., Khan, F., Khan, A., Tomar, Y., Gupta, R., Shaikh, A.: Voice to text transcription using CMU sphinx a mobile application for healthcare organization. In: 2nd ICICCT, pp. 749–753. IEEE (2018)
Turk, M.: Multimodal interaction: a review. Pattern Recogn. Lett. 36, 189–195 (2014)
Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M., Rudnicky, A.I.:. Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In: IEEE ICASSP Proceedings, vol. 1, pp. I-185-I-188. IEEE (2006)
Serrano, M., Nigay, L., Lawson, J.Y.L., Ramsay, A., Murray-Smith, R., Denef, S.: The openinterface framework: a tool for multimodal interaction. In: CHI’08 Extended Abstracts on Human Factors in Computing Systems, pp. 3501–3506 (2008)
Dhankar, A.: Study of deep learning and CMU sphinx in automatic speech recognition. In: ICACCI 2017, pp. 2296–2301. IEEE (2017)
Grif, H.S., Farcas, C.C.: Mouse cursor control system based on hand gesture. Procedia Technol. 22, 657–661 (2016)
Schaffer, S., Reithinger, N.: Benefit, design and evaluation of multimodal interaction. In: Proceedings of the 2016 DSLI Workshop. ACM CHI (2016)
Oviatt, S., Olsen, E.: Integration themes in multimodal human-computer interaction. In: 3rd ICSLP (1994)
Raisamo, R.: Multimodal Human-Computer Interaction: a constructive and empirical study. Tampere University Press (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ridhun, M., Lewis, R.S., Misquith, S.C., Poojary, S., Mahesh Karimbi, K. (2023). Multimodal Human Computer Interaction Using Hand Gestures and Speech. In: Zaynidinov, H., Singh, M., Tiwary, U.S., Singh, D. (eds) Intelligent Human Computer Interaction. IHCI 2022. Lecture Notes in Computer Science, vol 13741. Springer, Cham. https://doi.org/10.1007/978-3-031-27199-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-27199-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27198-4
Online ISBN: 978-3-031-27199-1
eBook Packages: Computer ScienceComputer Science (R0)