Genetic Algorithm-Based Improvement of Robot Hearing Capabilities in Separating and Recognizing Simultaneous Speech Signals

Yamamoto, Shun’ichi; Nakadai, Kazuhiro; Nakano, Mikio; Tsujino, Hiroshi; Valin, Jean-Marc; Takeda, Ryu; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

doi:10.1007/11779568_24

Shun’ichi Yamamoto²⁰,
Kazuhiro Nakadai²¹,
Mikio Nakano²¹,
Hiroshi Tsujino²¹,
Jean-Marc Valin²²,
Ryu Takeda²⁰,
Kazunori Komatani²⁰,
Tetsuya Ogata²⁰ &
…
Hiroshi G. Okuno²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4031))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1634 Accesses

Abstract

Since a robot usually hears a mixture of sounds, in particular, simultaneous speech signals, it should be able to localize, separate, and recognize each speech signal. Since separated speech signals suffer from spectral distortion, normal automatic speech recognition (ASR) may fail in recognizing such distorted speech signals. Yamamoto et al. proposed using the Missing Feature Theory to mask corrupt features in ASR, and developed the automatic missing-feature-mask generation (AMG) system by using information obtained by sound source separation (SSS). Our evaluations of recognition performance of the system indicate possibilities for improving it by optimizing many of its parameters. We used genetic algorithms to optimize these parameters. Each chromosome consists of a set of parameters for SSS and AMG, and each chromosome is evaluated by recognition rate of separated sounds. We obtained an optimized sets of parameters for each distance (from 50 cm to 250 cm by 50 cm) and direction (30, 60, and 90 degree intervals) for two simultaneous speech signals. The average isolated word recognition rates ranged from 84.9% to 94.7%.

This research was partially supported by the Ministry of Education, Culture, Sports, Science and Technology, Grant-in-Aid for Scientific Research and COE Program of Informatics Research Center for Development of Knowledge Society Infrastructure, and TAF and SCAT Grants.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Improving Speech Intelligibility in Monaural Segregation System by Fusing Voiced and Unvoiced Speech Segments

Article 20 December 2018

A new Genetic Algorithm based fusion scheme in monaural CASA system to improve the performance of the speech

Article 06 May 2019

Noise Destruction and Improvement of Speech Signal Quality Using Group Search Optimization (GSO) Algorithm

References

Cohen, I., Berdugo, B.: Microphone Array Post-Filtering for Non-Stationary Noise Suppression. In: Proc. of ICASSP 2002, pp. 901–904. IEEE, Los Alamitos (2002)
Google Scholar
Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust Automatic Speech Recognition with Missing and Unreliable Acoustic Data. Speech Communication 34, 267–285 (2001)
Article MATH Google Scholar
Goldberg, D.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Pub., Reading (1989)
MATH Google Scholar
Hara, I., Asano, F., Asoh, H., et al.: Robust Speech Interface based on Audio and Video Information Fusion for Humanoid HRP-2. In: Proc. of IROS 2004, pp. 2404–2410. IEEE & RSJ (2004)
Google Scholar
Nakadai, K., Okuno, H.G., Kitano, H.: Robot Recognizes Three Simultaneous Speech by Active Audition. In: Proc. of ICRA 2003, pp. 398–403. IEEE, Los Alamitos (2003)
Google Scholar
Nishimura, Y.: Multiband julius, http://www.furui.cs.titech.ac.jp/mband_julius/
Okuno, H.G., Nakadai, K., Lourens, T., Kitano, H.: Sound and Visual Tracking for Humanoid Robot. In: Monostori, L., Váncza, J., Ali, M. (eds.) IEA/AIE 2001. LNCS, vol. 2070, pp. 640–650. Springer, Heidelberg (2001)
Chapter Google Scholar
Parra, L.C., Alvino, C.V.: Geometric Source Separation: Mergin Convolutive Source Separation with Geometric Beamforming. IEEE Trans. on SAP 10(6), 352–362 (2002)
Google Scholar
Tasaki, T., Matsumoto, S., et al.: Distance-Based Dynamic Interaction of Humanoid Robot with Multiple People. In: Ali, M., Esposito, F. (eds.) IEA/AIE 2005. LNCS, vol. 3533, pp. 111–120. Springer, Heidelberg (2005)
Chapter Google Scholar
Tasaki, T., Komatani, K., Ogata, T., Okuno, H.G.: Spatially Mapping of Friendliness for Human-Robot Interaction. In: Proc. of IROS 2005, pp. 521–526. IEEE, Los Alamitos (2005)
Google Scholar
Valin, J.-M., Rouat, J., Michaud, F.: Enhanced Robot Audition based on Microphone Array Source Separation with Post-Filter. In: Proc. of IROS 2004. IEEE & RSJ (2004)
Google Scholar
Yamamoto, S., et al.: Assessment of General Applicability of Robot Audition System by Recognizing three Simultaneous Speeches. In: Proc. of IROS 2004, pp. 2111–2116. IEEE & RSJ (2004)
Google Scholar
Yamamoto, S., Valin, J.-M., et al.: Enhanced Robot Speech Recognition based on Microphone Array Source Separation and Missing Feature Theory. In: Proc. of ICRA 2005, pp. 1489–1494. IEEE, Los Alamitos (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Informatics, Kyoto University, Japan
Shun’ichi Yamamoto, Ryu Takeda, Kazunori Komatani, Tetsuya Ogata & Hiroshi G. Okuno
Honda Research Institute Japan Co., Ltd., Japan
Kazuhiro Nakadai, Mikio Nakano & Hiroshi Tsujino
CSIRO ICT Centre, Ausralia
Jean-Marc Valin

Authors

Shun’ichi Yamamoto
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhiro Nakadai
View author publications
You can also search for this author in PubMed Google Scholar
Mikio Nakano
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Tsujino
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Marc Valin
View author publications
You can also search for this author in PubMed Google Scholar
Ryu Takeda
View author publications
You can also search for this author in PubMed Google Scholar
Kazunori Komatani
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuya Ogata
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi G. Okuno
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Texas State University-San Marcos, Nueces 247, 601 University Drive, 78666-4616, San Marcos, TX, USA
Moonis Ali
ESIA Laboratoire d’Informatique, Sytèmes, Traitement de l’Information et de la Connaissance, Université de Savoie, B.P. 806, F-74016, ANNECY Cedex, France
Richard Dapoigny

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yamamoto, S. et al. (2006). Genetic Algorithm-Based Improvement of Robot Hearing Capabilities in Separating and Recognizing Simultaneous Speech Signals. In: Ali, M., Dapoigny, R. (eds) Advances in Applied Artificial Intelligence. IEA/AIE 2006. Lecture Notes in Computer Science(), vol 4031. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11779568_24

Download citation

DOI: https://doi.org/10.1007/11779568_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35453-6
Online ISBN: 978-3-540-35454-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Genetic Algorithm-Based Improvement of Robot Hearing Capabilities in Separating and Recognizing Simultaneous Speech Signals

Abstract

Access this chapter

Preview

Similar content being viewed by others

Improving Speech Intelligibility in Monaural Segregation System by Fusing Voiced and Unvoiced Speech Segments

A new Genetic Algorithm based fusion scheme in monaural CASA system to improve the performance of the speech

Noise Destruction and Improvement of Speech Signal Quality Using Group Search Optimization (GSO) Algorithm

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Genetic Algorithm-Based Improvement of Robot Hearing Capabilities in Separating and Recognizing Simultaneous Speech Signals

Abstract

Access this chapter

Preview

Similar content being viewed by others

Improving Speech Intelligibility in Monaural Segregation System by Fusing Voiced and Unvoiced Speech Segments

A new Genetic Algorithm based fusion scheme in monaural CASA system to improve the performance of the speech

Noise Destruction and Improvement of Speech Signal Quality Using Group Search Optimization (GSO) Algorithm

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation