Abstract
Since a robot usually hears a mixture of sounds, in particular, simultaneous speech signals, it should be able to localize, separate, and recognize each speech signal. Since separated speech signals suffer from spectral distortion, normal automatic speech recognition (ASR) may fail in recognizing such distorted speech signals. Yamamoto et al. proposed using the Missing Feature Theory to mask corrupt features in ASR, and developed the automatic missing-feature-mask generation (AMG) system by using information obtained by sound source separation (SSS). Our evaluations of recognition performance of the system indicate possibilities for improving it by optimizing many of its parameters. We used genetic algorithms to optimize these parameters. Each chromosome consists of a set of parameters for SSS and AMG, and each chromosome is evaluated by recognition rate of separated sounds. We obtained an optimized sets of parameters for each distance (from 50 cm to 250 cm by 50 cm) and direction (30, 60, and 90 degree intervals) for two simultaneous speech signals. The average isolated word recognition rates ranged from 84.9% to 94.7%.
This research was partially supported by the Ministry of Education, Culture, Sports, Science and Technology, Grant-in-Aid for Scientific Research and COE Program of Informatics Research Center for Development of Knowledge Society Infrastructure, and TAF and SCAT Grants.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cohen, I., Berdugo, B.: Microphone Array Post-Filtering for Non-Stationary Noise Suppression. In: Proc. of ICASSP 2002, pp. 901–904. IEEE, Los Alamitos (2002)
Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust Automatic Speech Recognition with Missing and Unreliable Acoustic Data. Speech Communication 34, 267–285 (2001)
Goldberg, D.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Pub., Reading (1989)
Hara, I., Asano, F., Asoh, H., et al.: Robust Speech Interface based on Audio and Video Information Fusion for Humanoid HRP-2. In: Proc. of IROS 2004, pp. 2404–2410. IEEE & RSJ (2004)
Nakadai, K., Okuno, H.G., Kitano, H.: Robot Recognizes Three Simultaneous Speech by Active Audition. In: Proc. of ICRA 2003, pp. 398–403. IEEE, Los Alamitos (2003)
Nishimura, Y.: Multiband julius, http://www.furui.cs.titech.ac.jp/mband_julius/
Okuno, H.G., Nakadai, K., Lourens, T., Kitano, H.: Sound and Visual Tracking for Humanoid Robot. In: Monostori, L., Váncza, J., Ali, M. (eds.) IEA/AIE 2001. LNCS, vol. 2070, pp. 640–650. Springer, Heidelberg (2001)
Parra, L.C., Alvino, C.V.: Geometric Source Separation: Mergin Convolutive Source Separation with Geometric Beamforming. IEEE Trans. on SAP 10(6), 352–362 (2002)
Tasaki, T., Matsumoto, S., et al.: Distance-Based Dynamic Interaction of Humanoid Robot with Multiple People. In: Ali, M., Esposito, F. (eds.) IEA/AIE 2005. LNCS, vol. 3533, pp. 111–120. Springer, Heidelberg (2005)
Tasaki, T., Komatani, K., Ogata, T., Okuno, H.G.: Spatially Mapping of Friendliness for Human-Robot Interaction. In: Proc. of IROS 2005, pp. 521–526. IEEE, Los Alamitos (2005)
Valin, J.-M., Rouat, J., Michaud, F.: Enhanced Robot Audition based on Microphone Array Source Separation with Post-Filter. In: Proc. of IROS 2004. IEEE & RSJ (2004)
Yamamoto, S., et al.: Assessment of General Applicability of Robot Audition System by Recognizing three Simultaneous Speeches. In: Proc. of IROS 2004, pp. 2111–2116. IEEE & RSJ (2004)
Yamamoto, S., Valin, J.-M., et al.: Enhanced Robot Speech Recognition based on Microphone Array Source Separation and Missing Feature Theory. In: Proc. of ICRA 2005, pp. 1489–1494. IEEE, Los Alamitos (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yamamoto, S. et al. (2006). Genetic Algorithm-Based Improvement of Robot Hearing Capabilities in Separating and Recognizing Simultaneous Speech Signals. In: Ali, M., Dapoigny, R. (eds) Advances in Applied Artificial Intelligence. IEA/AIE 2006. Lecture Notes in Computer Science(), vol 4031. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11779568_24
Download citation
DOI: https://doi.org/10.1007/11779568_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35453-6
Online ISBN: 978-3-540-35454-3
eBook Packages: Computer ScienceComputer Science (R0)