Skip to main content

Multilingual Voice Control for Endoscopic Procedures

  • Conference paper
  • First Online:
Internet of Things. User-Centric IoT (IoT360 2014)

Abstract

In this paper it is present a solution to improve the current endoscopic exams’ workflow. These exams require complex procedures, such as using both hands to manipulate buttons and pressing a foot pedal at the same time, to perform simple tasks like capturing frames for posterior analysis. In addition to this downside, the act of capturing frames freezes the video. The developed software module was integrated with the MIVbox device, a device for the acquisition, processing and storage of the endoscopic results It uses libraries developed by the PocketSphinx project to recognize a small amount of commands. The module was fine-tuned for the Portuguese language which presents some specific difficulties with speech recognition. It was obtained a Word Error Rate (WER) of 23.3 % for the English model and 29.1 % for the Portuguese one.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. O’Neil, E.H.: Recreating Health Professional Practice For A New Century, p. 106. Pew Health, San Francisco (1998)

    Google Scholar 

  2. Summerton, N.: Positive and negative factors in defensive medicine: a questionnaire study of general practitioners. BMJ 310, 27–29 (1995)

    Article  Google Scholar 

  3. Canard, J.M., Létard, J.-C., Palazzo, L., et al.: Gastrointestinal Endoscopy in Practice. 1st ed., p. 492. Churchill Livingstone, Paris (2011)

    Google Scholar 

  4. Barnett, J., Corrada, A., Gao G., et al.: Multilingual speech recognition at dragon systems. In: Proceeding Fourth International Conference on Spoken Language Process, ICSLP 1996, pp. 2191–2194. IEEE (1996)

    Google Scholar 

  5. Harvey, A.P., McCrindle, R.J., Lundqvist, K., Parslow, P.: Automatic speech recognition for assistive technology devices. In: Proceedings Of The 8th International Conference On Disability Virtual Reality And Associated Technologies. Valparaíso, pp 273–282 (2010)

    Google Scholar 

  6. Aymen, M., Abdelaziz, A., Halim, S., Maaref, H.: Hidden Markov Models for automatic speech recognition. In: 2011 International Conference on Communications, Computing and Control Applications, pp. 1–6. IEEE (2011)

    Google Scholar 

  7. Young, S., Evermann, G., Kershaw, D., et al.: HTK speech recognition toolkit. http://htk.eng.cam.ac.uk/. Accessed 3 February 2014

  8. Lee, K.-F., Hon, H.-W., Reddy, R.: An overview of the SPHINX speech recognition system. IEEE Trans. Acoust. 38, 35–45 (1990)

    Article  Google Scholar 

  9. Huang, X., Alleva, F., Hon, H.-W., et al.: The SPHINX-II speech recognition system: an overview. Comput. Speech Lang. 7, 137–148 (1993)

    Article  Google Scholar 

  10. Seltzer, M.: SPHINX III signal processing front end specification, vol. 31, pp. 1–4 (1999)

    Google Scholar 

  11. Lamere, P., Kwok, P., Gouvea, E., et al.: The CMU SPHINX-4 speech recognition system. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003). Hong Kong, pp. 2–5 (2003)

    Google Scholar 

  12. Vertanen, K.: Baseline WSJ Acoustic Models for HTK and Sphinx: training recipes and recognition experiments. Cavendish Laboratory University, Cambridge (2006)

    Google Scholar 

  13. Ma, G., Zhou, W., Zheng, J., et al.: A comparison between HTK and SPHINX on chinese mandarin. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 394–397 (2009)

    Google Scholar 

  14. Huggins-Daines, D., Kumar, M., Chan, A., et al.: Pocketsphinx: a free, real-time continuous speech recognition system for hand-held devices. In: 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I-185–I-188 (2006)

    Google Scholar 

  15. John, V.: Phonetic decomposition for speech recognition of lesser-studied languages. In: Proceedings of 2009 International Conference on Intercultural Collaboration, p. 253. ACM Press, New York (2009)

    Google Scholar 

  16. Varela, A., Cuayáhuitl, H., Nolazco-Flores, J.A.: Creating a Mexican Spanish version of the cmu sphinx-iii speech recognition system. In: Sanfeliu, A., Ruiz-Shulcloper, J. (eds.) CIARP 2003. LNCS, vol. 2905, pp. 251–258. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  17. Wang, Y., Zhang, X.: Realization of Mandarin continuous digits speech recognition system using sphinx. In: 2010 International Symposium on Computer, Communication, Control and Automation, pp. 378–380 (2010)

    Google Scholar 

  18. Hyassat, H., Abu Zitar, R.: Arabic speech recognition using SPHINX engine. Int. J. Speech Technol. 9, 133–150 (2008)

    Article  Google Scholar 

  19. Salvi, G.: Developing Acoustic Models For Automatic Speech Recognition (1998)

    Google Scholar 

  20. Kirchhoff, K., Fink, G.A., Sagerer, G.: Combining acoustic and articulatory feature information for robust speech recognition. Speech Commun. 37, 303–319 (2002)

    Article  MATH  Google Scholar 

  21. Laranjo, I., Braga, J., Assunção, D., Silva, A., Rolanda, C., Lopes, L., Correia-Pinto, J., Alves, V.: Web-based solution for acquisition, processing, archiving and diffusion of endoscopy studies. In: Omatu, S., Neves, J., Corchado Rodriguez, J.M., Paz Santana, J.F., Gonzalez, S.R. (eds.) Distributed Computing and Artificial Intelligence. AISC 217, pp. 317–324. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  22. Braga, J., Laranjo, I., Assunção, D., et al.: Endoscopic imaging results: web based solution with video diffusion. Procedia Technol. 9, 1123–1131 (2013)

    Article  Google Scholar 

  23. Clarkson, P., Rosenfeld, R.: Statistical language modeling using the CMU-cambridge toolkit. In: 5th European Conference on Speech Communication and Technology, ISCA Archive, Rhodes, Greece, pp. 2707–2710 (1997)

    Google Scholar 

  24. Bundy, A., Wallen, L.: Context-free grammar. In: Bundy, A., Wallen, L. (eds.) Catalogue of Artificial Intelligence Tools, pp. 22–23. Springer, Heidelberg (1984)

    Chapter  Google Scholar 

  25. Hunt, A.: JSpeech Grammar Format (2000)

    Google Scholar 

Download references

Acknowledgments

This work is funded by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT - Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within project PEst-OE/EEI/UI0752/2014.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Victor Alves .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Afonso, S., Laranjo, I., Braga, J., Alves, V., Neves, J. (2015). Multilingual Voice Control for Endoscopic Procedures. In: Giaffreda, R., et al. Internet of Things. User-Centric IoT. IoT360 2014. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 150. Springer, Cham. https://doi.org/10.1007/978-3-319-19656-5_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19656-5_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19655-8

  • Online ISBN: 978-3-319-19656-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics