Abstract
In this paper it is present a solution to improve the current endoscopic exams’ workflow. These exams require complex procedures, such as using both hands to manipulate buttons and pressing a foot pedal at the same time, to perform simple tasks like capturing frames for posterior analysis. In addition to this downside, the act of capturing frames freezes the video. The developed software module was integrated with the MIVbox device, a device for the acquisition, processing and storage of the endoscopic results It uses libraries developed by the PocketSphinx project to recognize a small amount of commands. The module was fine-tuned for the Portuguese language which presents some specific difficulties with speech recognition. It was obtained a Word Error Rate (WER) of 23.3 % for the English model and 29.1 % for the Portuguese one.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
O’Neil, E.H.: Recreating Health Professional Practice For A New Century, p. 106. Pew Health, San Francisco (1998)
Summerton, N.: Positive and negative factors in defensive medicine: a questionnaire study of general practitioners. BMJ 310, 27–29 (1995)
Canard, J.M., Létard, J.-C., Palazzo, L., et al.: Gastrointestinal Endoscopy in Practice. 1st ed., p. 492. Churchill Livingstone, Paris (2011)
Barnett, J., Corrada, A., Gao G., et al.: Multilingual speech recognition at dragon systems. In: Proceeding Fourth International Conference on Spoken Language Process, ICSLP 1996, pp. 2191–2194. IEEE (1996)
Harvey, A.P., McCrindle, R.J., Lundqvist, K., Parslow, P.: Automatic speech recognition for assistive technology devices. In: Proceedings Of The 8th International Conference On Disability Virtual Reality And Associated Technologies. Valparaíso, pp 273–282 (2010)
Aymen, M., Abdelaziz, A., Halim, S., Maaref, H.: Hidden Markov Models for automatic speech recognition. In: 2011 International Conference on Communications, Computing and Control Applications, pp. 1–6. IEEE (2011)
Young, S., Evermann, G., Kershaw, D., et al.: HTK speech recognition toolkit. http://htk.eng.cam.ac.uk/. Accessed 3 February 2014
Lee, K.-F., Hon, H.-W., Reddy, R.: An overview of the SPHINX speech recognition system. IEEE Trans. Acoust. 38, 35–45 (1990)
Huang, X., Alleva, F., Hon, H.-W., et al.: The SPHINX-II speech recognition system: an overview. Comput. Speech Lang. 7, 137–148 (1993)
Seltzer, M.: SPHINX III signal processing front end specification, vol. 31, pp. 1–4 (1999)
Lamere, P., Kwok, P., Gouvea, E., et al.: The CMU SPHINX-4 speech recognition system. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003). Hong Kong, pp. 2–5 (2003)
Vertanen, K.: Baseline WSJ Acoustic Models for HTK and Sphinx: training recipes and recognition experiments. Cavendish Laboratory University, Cambridge (2006)
Ma, G., Zhou, W., Zheng, J., et al.: A comparison between HTK and SPHINX on chinese mandarin. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 394–397 (2009)
Huggins-Daines, D., Kumar, M., Chan, A., et al.: Pocketsphinx: a free, real-time continuous speech recognition system for hand-held devices. In: 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I-185–I-188 (2006)
John, V.: Phonetic decomposition for speech recognition of lesser-studied languages. In: Proceedings of 2009 International Conference on Intercultural Collaboration, p. 253. ACM Press, New York (2009)
Varela, A., Cuayáhuitl, H., Nolazco-Flores, J.A.: Creating a Mexican Spanish version of the cmu sphinx-iii speech recognition system. In: Sanfeliu, A., Ruiz-Shulcloper, J. (eds.) CIARP 2003. LNCS, vol. 2905, pp. 251–258. Springer, Heidelberg (2003)
Wang, Y., Zhang, X.: Realization of Mandarin continuous digits speech recognition system using sphinx. In: 2010 International Symposium on Computer, Communication, Control and Automation, pp. 378–380 (2010)
Hyassat, H., Abu Zitar, R.: Arabic speech recognition using SPHINX engine. Int. J. Speech Technol. 9, 133–150 (2008)
Salvi, G.: Developing Acoustic Models For Automatic Speech Recognition (1998)
Kirchhoff, K., Fink, G.A., Sagerer, G.: Combining acoustic and articulatory feature information for robust speech recognition. Speech Commun. 37, 303–319 (2002)
Laranjo, I., Braga, J., Assunção, D., Silva, A., Rolanda, C., Lopes, L., Correia-Pinto, J., Alves, V.: Web-based solution for acquisition, processing, archiving and diffusion of endoscopy studies. In: Omatu, S., Neves, J., Corchado Rodriguez, J.M., Paz Santana, J.F., Gonzalez, S.R. (eds.) Distributed Computing and Artificial Intelligence. AISC 217, pp. 317–324. Springer, Heidelberg (2013)
Braga, J., Laranjo, I., Assunção, D., et al.: Endoscopic imaging results: web based solution with video diffusion. Procedia Technol. 9, 1123–1131 (2013)
Clarkson, P., Rosenfeld, R.: Statistical language modeling using the CMU-cambridge toolkit. In: 5th European Conference on Speech Communication and Technology, ISCA Archive, Rhodes, Greece, pp. 2707–2710 (1997)
Bundy, A., Wallen, L.: Context-free grammar. In: Bundy, A., Wallen, L. (eds.) Catalogue of Artificial Intelligence Tools, pp. 22–23. Springer, Heidelberg (1984)
Hunt, A.: JSpeech Grammar Format (2000)
Acknowledgments
This work is funded by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT - Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within project PEst-OE/EEI/UI0752/2014.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Afonso, S., Laranjo, I., Braga, J., Alves, V., Neves, J. (2015). Multilingual Voice Control for Endoscopic Procedures. In: Giaffreda, R., et al. Internet of Things. User-Centric IoT. IoT360 2014. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 150. Springer, Cham. https://doi.org/10.1007/978-3-319-19656-5_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-19656-5_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19655-8
Online ISBN: 978-3-319-19656-5
eBook Packages: Computer ScienceComputer Science (R0)