Abstract
In this paper we propose an embodied approach to automatic speech recognition, where a humanoid robot adjusts its orientation to the angle that increases the signal-to-noise ratio of speech. In other words, the robot turns its face to ’hear’ the speaker better, similar to what people with auditory deficiencies do. The robot tracks a speaker with a binaural sound source localisation system (SSL) that uses spiking neural networks to model relevant areas in the mammalian auditory pathway for SSL. The accuracy of speech recognition is doubled when the robot orients towards the speaker in an optimal angle and listens only through one ear instead of averaging the input from both ears.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Asano, F., Goto, M., Itou, K., Asoh, H.: Real-time sound source localization and separation system and its application to automatic speech recognition. In: INTERSPEECH, pp. 1013–1016 (2001)
Bauer, J., Davila-Chacon, J., Strahl, E., Wermter, S.: Smoke and mirrors — Virtual realities for sensor fusion experiments in biomimetic robotics. In: Intl. Conf. on Multisensor Fusion and Integration, MFI, pp. 114–119. IEEE (2012)
Beira, R., Lopes, M., Praga, M., Santos-Victor, J., Bernardino, A., Metta, G., Becchi, F., Saltarén, R.: Design of the robot-cub (iCub) head. In: Intl. Conf. on Robotics and Automation, ICRA, pp. 94–100. IEEE (2006)
Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Communication 50(5), 434–451 (2008)
Cong-qing, L., Fang, W., Shi-jie, D., Li-xin, S., He, H., Li-ying, S.: A novel method of binaural sound localization based on dominant frequency separation. In: Intl. Cong. on Image and Signal Processing, CISP, pp. 1–4. IEEE (2009)
Davila-Chacon, J., Heinrich, S., Liu, J., Wermter, S.: Biomimetic binaural sound source localisation with ego-noise cancellation. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds.) ICANN 2012, Part I. LNCS, vol. 7552, pp. 239–246. Springer, Heidelberg (2012)
Davila-Chacon, J., Magg, S., Liu, J., Wermter, S.: Neural and statistical processing of spatial cues for sound source localisation. In: Intl. Joint Conf. on Neural Networks, IJCNN. IEEE (2013)
Deleforge, A., Horaud, R.: The cocktail party robot: Sound source separation and localisation with an active binaural head. In: Proceedings of the International Conference on Human-Robot Interaction, pp. 431–438. ACM/IEEE (2012)
Fréchette, M., Létourneau, D., Valin, J., Michaud, F.: Integration of sound source localization and separation to improve dialogue management on a robot. In: Intl. Conf. on Intelligent Robots and Systems, IROS, pp. 2358–2363. IEEE (2012)
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S.: Darpa timit acoustic-phonetic continuous speech corpus cd-rom. nist speech disc 1-1.1. NASA STI/Recon Technical Report N 93, 27403 (1993)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. In: Soviet Physics Doklady, vol. 10, pp. 707–710 (1966)
Liu, J., Perez-Gonzalez, D., Rees, A., Erwin, H., Wermter, S.: A biologically inspired spiking neural network model of the auditory midbrain for sound source localisation. Neurocomputing 74(1-3), 129–139 (2010)
Schalkwyk, J., Beeferman, D., Beaufays, F., Byrne, B., Chelba, C., Cohen, M., Kamvar, M., Strope, B.: Your word is my command: Google search by voice: A case study. In: Advances in Speech Recognition, pp. 61–90. Springer (2010)
Schnupp, J., Nelken, I., King, A.: Auditory neuroscience: Making sense of sound. The MIT Press (2011)
Slaney, M.: An efficient implementation of the Patterson-Holdsworth auditory filter bank. Tech. rep. Apple Computer, Perception Group (1993)
Zion-Golumbic, E., Schroeder, C.E.: Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party”. Neuron 77(5), 980–991 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Dávila-Chacón, J., Twiefel, J., Liu, J., Wermter, S. (2014). Improving Humanoid Robot Speech Recognition with Sound Source Localisation. In: Wermter, S., et al. Artificial Neural Networks and Machine Learning – ICANN 2014. ICANN 2014. Lecture Notes in Computer Science, vol 8681. Springer, Cham. https://doi.org/10.1007/978-3-319-11179-7_78
Download citation
DOI: https://doi.org/10.1007/978-3-319-11179-7_78
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11178-0
Online ISBN: 978-3-319-11179-7
eBook Packages: Computer ScienceComputer Science (R0)