Multilingual Voice Control for Endoscopic Procedures

Afonso, Simão; Laranjo, Isabel; Braga, Joel; Alves, Victor; Neves, José

doi:10.1007/978-3-319-19656-5_33

Simão Afonso²³,
Isabel Laranjo²³,
Joel Braga²³,
Victor Alves²³ &
…
José Neves²³

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 150))

Included in the following conference series:

International Internet of Things Summit

2586 Accesses

Abstract

In this paper it is present a solution to improve the current endoscopic exams’ workflow. These exams require complex procedures, such as using both hands to manipulate buttons and pressing a foot pedal at the same time, to perform simple tasks like capturing frames for posterior analysis. In addition to this downside, the act of capturing frames freezes the video. The developed software module was integrated with the MIVbox device, a device for the acquisition, processing and storage of the endoscopic results It uses libraries developed by the PocketSphinx project to recognize a small amount of commands. The module was fine-tuned for the Portuguese language which presents some specific difficulties with speech recognition. It was obtained a Word Error Rate (WER) of 23.3 % for the English model and 29.1 % for the Portuguese one.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

IXHEALTH: A Multilingual Platform for Advanced Speech Recognition in Healthcare

Frequency and analysis of non-clinical errors made in radiology reports using the National Integrated Medical Imaging System voice recognition dictation software

Article 01 October 2016

Take a shot! Natural language control of intelligent robotic X-ray systems in surgery

Article 15 April 2024

References

O’Neil, E.H.: Recreating Health Professional Practice For A New Century, p. 106. Pew Health, San Francisco (1998)
Google Scholar
Summerton, N.: Positive and negative factors in defensive medicine: a questionnaire study of general practitioners. BMJ 310, 27–29 (1995)
Article Google Scholar
Canard, J.M., Létard, J.-C., Palazzo, L., et al.: Gastrointestinal Endoscopy in Practice. 1st ed., p. 492. Churchill Livingstone, Paris (2011)
Google Scholar
Barnett, J., Corrada, A., Gao G., et al.: Multilingual speech recognition at dragon systems. In: Proceeding Fourth International Conference on Spoken Language Process, ICSLP 1996, pp. 2191–2194. IEEE (1996)
Google Scholar
Harvey, A.P., McCrindle, R.J., Lundqvist, K., Parslow, P.: Automatic speech recognition for assistive technology devices. In: Proceedings Of The 8th International Conference On Disability Virtual Reality And Associated Technologies. Valparaíso, pp 273–282 (2010)
Google Scholar
Aymen, M., Abdelaziz, A., Halim, S., Maaref, H.: Hidden Markov Models for automatic speech recognition. In: 2011 International Conference on Communications, Computing and Control Applications, pp. 1–6. IEEE (2011)
Google Scholar
Young, S., Evermann, G., Kershaw, D., et al.: HTK speech recognition toolkit. http://htk.eng.cam.ac.uk/. Accessed 3 February 2014
Lee, K.-F., Hon, H.-W., Reddy, R.: An overview of the SPHINX speech recognition system. IEEE Trans. Acoust. 38, 35–45 (1990)
Article Google Scholar
Huang, X., Alleva, F., Hon, H.-W., et al.: The SPHINX-II speech recognition system: an overview. Comput. Speech Lang. 7, 137–148 (1993)
Article Google Scholar
Seltzer, M.: SPHINX III signal processing front end specification, vol. 31, pp. 1–4 (1999)
Google Scholar
Lamere, P., Kwok, P., Gouvea, E., et al.: The CMU SPHINX-4 speech recognition system. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003). Hong Kong, pp. 2–5 (2003)
Google Scholar
Vertanen, K.: Baseline WSJ Acoustic Models for HTK and Sphinx: training recipes and recognition experiments. Cavendish Laboratory University, Cambridge (2006)
Google Scholar
Ma, G., Zhou, W., Zheng, J., et al.: A comparison between HTK and SPHINX on chinese mandarin. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 394–397 (2009)
Google Scholar
Huggins-Daines, D., Kumar, M., Chan, A., et al.: Pocketsphinx: a free, real-time continuous speech recognition system for hand-held devices. In: 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I-185–I-188 (2006)
Google Scholar
John, V.: Phonetic decomposition for speech recognition of lesser-studied languages. In: Proceedings of 2009 International Conference on Intercultural Collaboration, p. 253. ACM Press, New York (2009)
Google Scholar
Varela, A., Cuayáhuitl, H., Nolazco-Flores, J.A.: Creating a Mexican Spanish version of the cmu sphinx-iii speech recognition system. In: Sanfeliu, A., Ruiz-Shulcloper, J. (eds.) CIARP 2003. LNCS, vol. 2905, pp. 251–258. Springer, Heidelberg (2003)
Chapter Google Scholar
Wang, Y., Zhang, X.: Realization of Mandarin continuous digits speech recognition system using sphinx. In: 2010 International Symposium on Computer, Communication, Control and Automation, pp. 378–380 (2010)
Google Scholar
Hyassat, H., Abu Zitar, R.: Arabic speech recognition using SPHINX engine. Int. J. Speech Technol. 9, 133–150 (2008)
Article Google Scholar
Salvi, G.: Developing Acoustic Models For Automatic Speech Recognition (1998)
Google Scholar
Kirchhoff, K., Fink, G.A., Sagerer, G.: Combining acoustic and articulatory feature information for robust speech recognition. Speech Commun. 37, 303–319 (2002)
Article MATH Google Scholar
Laranjo, I., Braga, J., Assunção, D., Silva, A., Rolanda, C., Lopes, L., Correia-Pinto, J., Alves, V.: Web-based solution for acquisition, processing, archiving and diffusion of endoscopy studies. In: Omatu, S., Neves, J., Corchado Rodriguez, J.M., Paz Santana, J.F., Gonzalez, S.R. (eds.) Distributed Computing and Artificial Intelligence. AISC 217, pp. 317–324. Springer, Heidelberg (2013)
Chapter Google Scholar
Braga, J., Laranjo, I., Assunção, D., et al.: Endoscopic imaging results: web based solution with video diffusion. Procedia Technol. 9, 1123–1131 (2013)
Article Google Scholar
Clarkson, P., Rosenfeld, R.: Statistical language modeling using the CMU-cambridge toolkit. In: 5th European Conference on Speech Communication and Technology, ISCA Archive, Rhodes, Greece, pp. 2707–2710 (1997)
Google Scholar
Bundy, A., Wallen, L.: Context-free grammar. In: Bundy, A., Wallen, L. (eds.) Catalogue of Artificial Intelligence Tools, pp. 22–23. Springer, Heidelberg (1984)
Chapter Google Scholar
Hunt, A.: JSpeech Grammar Format (2000)
Google Scholar

Download references

Acknowledgments

This work is funded by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT - Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within project PEst-OE/EEI/UI0752/2014.

Author information

Authors and Affiliations

CCTC - Computer Science and Technology Center, University of Minho, Braga, Portugal
Simão Afonso, Isabel Laranjo, Joel Braga, Victor Alves & José Neves

Authors

Simão Afonso
View author publications
You can also search for this author in PubMed Google Scholar
Isabel Laranjo
View author publications
You can also search for this author in PubMed Google Scholar
Joel Braga
View author publications
You can also search for this author in PubMed Google Scholar
Victor Alves
View author publications
You can also search for this author in PubMed Google Scholar
José Neves
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Victor Alves .

Editor information

Editors and Affiliations

CREATE-NET, Trento, Italy
Raffaele Giaffreda
University of Trento, Trento, Italy
Radu-Laurentiu Vieriu
Management Consultant, Edna Pasher Ph.D & Associates, Tel Aviv, Israel
Edna Pasher
Management Consultant, Edna Pasher Ph.D & Associates, Tel Aviv, Israel
Gabriel Bendersky
University of Applied Sciences, Institute of Information Systems, Delémont, Switzerland
Antonio J. Jara
University of Beira Interior, Covilhã, Portugal
Joel J.P.C. Rodrigues
IBM Research Laboratory, Haifa, Israel
Eliezer Dekel
IBM Research, Haifa, Israel
Benny Mandler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Afonso, S., Laranjo, I., Braga, J., Alves, V., Neves, J. (2015). Multilingual Voice Control for Endoscopic Procedures. In: Giaffreda, R., et al. Internet of Things. User-Centric IoT. IoT360 2014. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 150. Springer, Cham. https://doi.org/10.1007/978-3-319-19656-5_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-19656-5_33
Published: 26 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19655-8
Online ISBN: 978-3-319-19656-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multilingual Voice Control for Endoscopic Procedures

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

IXHEALTH: A Multilingual Platform for Advanced Speech Recognition in Healthcare

Frequency and analysis of non-clinical errors made in radiology reports using the National Integrated Medical Imaging System voice recognition dictation software

Take a shot! Natural language control of intelligent robotic X-ray systems in surgery

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multilingual Voice Control for Endoscopic Procedures

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

IXHEALTH: A Multilingual Platform for Advanced Speech Recognition in Healthcare

Frequency and analysis of non-clinical errors made in radiology reports using the National Integrated Medical Imaging System voice recognition dictation software

Take a shot! Natural language control of intelligent robotic X-ray systems in surgery

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation