Multimodal Human Computer Interaction Using Hand Gestures and Speech

Ridhun, Mohammed; Lewis, Rayan Smith; Misquith, Shane Christopher; Poojary, Sushanth; Mahesh Karimbi, Kavitha

doi:10.1007/978-3-031-27199-1_7

Mohammed Ridhun¹¹,
Rayan Smith Lewis¹¹,
Shane Christopher Misquith¹¹,
Sushanth Poojary¹¹ &
…
Kavitha Mahesh Karimbi¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13741))

Included in the following conference series:

International Conference on Intelligent Human Computer Interaction

645 Accesses

Abstract

The paper presents multimodal human-computer interaction using speech and gesture recognition to develop a system for mouse movement and operation. The approach allows users to perform mouse navigation and various mouse operations without the need for physical contact with the system. Splitting up the task of mouse navigation and operations with gesture and speech recognition respectively led to a user-friendly and seamless experience for the user. Since no physical contact is required between the user and the system, it could be used by doctors while performing surgery, mechanics while they are handling their instruments from a distance, and casual users if circumstance arise. Unlike a unimodal gesture recognition system the proposed multimodal system allows mouse pointer control using speech and employs gestures to perform mouse operations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://google.github.io/mediapipe/solutions/hands#python-solution-api/.
2.
Michael Sanders, AutoPy Introduction and Tutorial, Available: https://pypi.org/project/autopy/.
3.
NumPy, Available: https://numpy.org/.
4.
Michael Sanders, AutoPy Introduction and Tutorial, Available: https://pypi.org/project/autopy/.
5.
Google Cloud speech-to-text, Available: https://cloud.google.com/speech-to-text.
6.
PyAudio, Available: https://pypi.org/project/PyAudio/.
7.
Cloud speech-to-text Documentation, Available: https://cloud.google.com/speech-totext/docs/basics.
8.
mouse_event function (winuser.h), Available: https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-mouse_event.
9.
Keyboard module in Python, Available: https://www.geeksforgeeks.org/keyboard-module-in-python/.

References

Ergüner, F., Durdu, P.O.: Multimodal natural interaction for 3d images. In: 9th International Conference on AICT, pp. 305–309. IEEE (2015)
Google Scholar
Sahadat, M.N., Alreja, A., Mikail, N., Ghovanloo, M.: Comparing the use of single versus multiple combined abilities in conducting complex computer tasks hands-free. IEEE Trans. Neural Syst. Rehabil. Eng. 26(9), 1868–1877 (2018)
Article Google Scholar
Lakdawala, B., Khan, F., Khan, A., Tomar, Y., Gupta, R., Shaikh, A.: Voice to text transcription using CMU sphinx a mobile application for healthcare organization. In: 2nd ICICCT, pp. 749–753. IEEE (2018)
Google Scholar
Turk, M.: Multimodal interaction: a review. Pattern Recogn. Lett. 36, 189–195 (2014)
Article Google Scholar
Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M., Rudnicky, A.I.:. Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In: IEEE ICASSP Proceedings, vol. 1, pp. I-185-I-188. IEEE (2006)
Google Scholar
Serrano, M., Nigay, L., Lawson, J.Y.L., Ramsay, A., Murray-Smith, R., Denef, S.: The openinterface framework: a tool for multimodal interaction. In: CHI’08 Extended Abstracts on Human Factors in Computing Systems, pp. 3501–3506 (2008)
Google Scholar
Dhankar, A.: Study of deep learning and CMU sphinx in automatic speech recognition. In: ICACCI 2017, pp. 2296–2301. IEEE (2017)
Google Scholar
Grif, H.S., Farcas, C.C.: Mouse cursor control system based on hand gesture. Procedia Technol. 22, 657–661 (2016)
Article Google Scholar
Schaffer, S., Reithinger, N.: Benefit, design and evaluation of multimodal interaction. In: Proceedings of the 2016 DSLI Workshop. ACM CHI (2016)
Google Scholar
Oviatt, S., Olsen, E.: Integration themes in multimodal human-computer interaction. In: 3rd ICSLP (1994)
Google Scholar
Raisamo, R.: Multimodal Human-Computer Interaction: a constructive and empirical study. Tampere University Press (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

St Joseph Engineering College Mangaluru, Visvesvaraya Technological University, Belagavi, 575028, Karnataka, India
Mohammed Ridhun, Rayan Smith Lewis, Shane Christopher Misquith, Sushanth Poojary & Kavitha Mahesh Karimbi

Authors

Mohammed Ridhun
View author publications
You can also search for this author in PubMed Google Scholar
Rayan Smith Lewis
View author publications
You can also search for this author in PubMed Google Scholar
Shane Christopher Misquith
View author publications
You can also search for this author in PubMed Google Scholar
Sushanth Poojary
View author publications
You can also search for this author in PubMed Google Scholar
Kavitha Mahesh Karimbi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kavitha Mahesh Karimbi .

Editor information

Editors and Affiliations

Tashkent University Information Technologies, Tashkent, Uzbekistan
Hakimjon Zaynidinov
Oregon Institute of Technology, Klamath Falls, USA
Madhusudan Singh
Indian Institute of Information Technology, Allahabad, India
Uma Shanker Tiwary
Hankuk University of Foreign Studies, Yongin, Korea (Republic of)
Dhananjay Singh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ridhun, M., Lewis, R.S., Misquith, S.C., Poojary, S., Mahesh Karimbi, K. (2023). Multimodal Human Computer Interaction Using Hand Gestures and Speech. In: Zaynidinov, H., Singh, M., Tiwary, U.S., Singh, D. (eds) Intelligent Human Computer Interaction. IHCI 2022. Lecture Notes in Computer Science, vol 13741. Springer, Cham. https://doi.org/10.1007/978-3-031-27199-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-27199-1_7
Published: 11 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27198-4
Online ISBN: 978-3-031-27199-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multimodal Human Computer Interaction Using Hand Gestures and Speech