Robot Audition: Missing Feature Theory Approach and Active Audition

Okuno, Hiroshi G.; Nakadai, Kazuhiro; Kim, Hyun-Don

doi:10.1007/978-3-642-19457-3_14

Hiroshi G. Okuno⁶,
Kazuhiro Nakadai^7,8 &
Hyun-Don Kim⁶

Part of the book series: Springer Tracts in Advanced Robotics ((STAR,volume 70))

5738 Accesses
10 Citations

Abstract

Robot capability of listening to several things at once by its own ears, that is,robot audition, is important in improving interaction and symbiosis between humans and robots. The critical issue in robot audition is real-time processing and robustness against noisy environments with high flexibility to support various kinds of robots and hardware configurations. This paper presents two important aspects of robot audition; Missing-Feature-Theory (MFT) approach and active audition. HARK open-source robot audition incorporates MFT approach to recognize speech signals that are localized and separated from a mixture of sound captured by 8- channel microphone array. HARK is ported to four robots, Honda ASIMO, SIG2, Robovie-R2 and HRP-2, with different microphone configurations and recognizes three simultaneous utterances with 1.9 sec latency. In binaural hearing, the most famous problem is a front-back confusion of sound sources. Active binaural robot audition implemented on SIG2 disambiguates the problem well by rotating its head with pitting. This active audition improves the localization for the periphery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aloimonos, Y., Weiss, I., Bandyopadhyay, A.: Active vision. Intern’l J. of Computer Vision 1(4), 333–356 (1999)
Article Google Scholar
Asano, F., Asoh, H., Matsui, T.: Sound source localization and signal separation for office robot “Jijo-2”. In: Proc. of IEEE Intern’l Conf. on Multisensor Fusion and Integration for Intelligent Systems, pp. 243–248 (1999)
Google Scholar
Bahoura, M., Pelletier, C.: Respiratory Sound Classification using Cepstral Analysis and Gaussian Mixture Models. In: IEEE/EMBS Intern’l Conf., San Francisco, USA (2004)
Google Scholar
Berglund, E.J.: Active Audition for Robots using Parameter-Less Self-Organising Maps. Ph. D Thesis, The University of Queensland, Australia (2005)
Google Scholar
Barker, J., Cooke, M., Green, P.: Robust ASR Based on Clean Speech Models: An Evaluation of Missing Data Techniques for Connected Digit Recognition in Noise. In: 7th European Conference on Speech Communication Technology, pp. 213–216 (2001)
Google Scholar
Blauert, J.: Spatial Hearing – The Psychophysics of Human Sound Localization. The MIT Press, Cambridge (1996) (revised edition)
Google Scholar
Breazeal, C.: Emotive Qualities in Robot Speech. In: Proceeding of IEEE/RSJ Intern’l Conf. on Intelligent Robots and Systems, Hawaii, pp. 1389–1394 (2001)
Google Scholar
Bregman, A.S.: Auditory Scene Analysis. The MIT Press, Cambridge (1990)
Google Scholar
Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust Automatic Speech Recognition with Missing and Unreliable Acoustic Data. Speech Communication, vol. 34, pp. 267–285. Elsevier, Amsterdam (2001)
Google Scholar
Ephraim, Y., Malah, D.: Speech enhancement using minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. on ASSP 32(6), 1109–1121 (1984)
Article Google Scholar
Hara, I., Asano, F., Kawai, Y., Kanehiro, F., Yamamoto, K.: Robust speech interface based on audio and video information fusion for humanoid HRP-2. In: Proceeding of IEEE/RSJ Intern’l Conf. on Intelligent Robots and Systems, Sendai, Japan, pp. 2404–2410 (2004)
Google Scholar
Kim, H.D., Komatani, K., Ogata, T., Okuno, H.G.: Real-Time Auditory and Visual Talker Tracking through integrating EM algorithm and Particle Filter. In: Okuno, H.G., Ali, M. (eds.) IEA/AIE 2007. LNCS (LNAI), vol. 4570, pp. 280–290. Springer, Heidelberg (2007)
Chapter Google Scholar
Kim, H.D., Komatani, K., Ogata, T., Okuno, H.G.: Human Tracking System Integrating Sound and Face Localization using EM Algorithm in Real Environments. Advanced Robotics, 23(6):629–653 (2009) doi: 10.1163/156855309X431659
Google Scholar
Kim, H.D., Komatani, K., Ogata, T., Okuno, H.G.: Binaural Active Audition for Humanoid Robots to Localize Speech over Entire Azimuth Range. Applied Bionic and Biomechanics (2009) (to appear)
Google Scholar
Lu, L., Zhang, H.G., Jiang, H.: Content Analysis for Audio Classification and Segmentation. IEEE Trans. on Speech and Audio Processing 10(7), 504–516 (2002)
Article Google Scholar
Michaud, F., et al.: Robust Recognition of Simultaneous Speech by a Mobile Robot. IEEE Trans. on Robotics 23(4), 742–752 (2007)
Article Google Scholar
Moon, T.K.: The Expectation-Maximization algorithm. IEEE Signal Processing Magazine 13(6), 47–60 (1996)
Article Google Scholar
Nakadai, K., et al.: Active audition for humanoid. In: Proc. of 17th National Conference on Artificial Intelligence, pp. 832–839. AAAI, Menlo Park (2000)
Google Scholar
Nakadai, K., Hidai, K., Okuno, H.G., Kitano, H.: Real-Time Speaker Localization and Speech Separation by Audio-Visual Integration. In: Proc. of IEEE-RAS Intern’l Conf. on Robotics and Automation, May 2002, pp. 1043–1049 (2002), doi:10.1109/ROBOT.2002.1013493
Google Scholar
Nakadai, K., Matasuura, D., Okuno, H.G., Tsujino, H.: Improvement of recognition of simultaneous speech signals using AV integration and scattering theory for humanoid robots. Speech Communication 44(4), 97–112 (2004), doi:10.1016/j.specom.2004.10.010
Article Google Scholar
Nakadai, K., Okuno, H.G.: An Open Source Software System for Robot Audition HARK and Its Evaluation. In: IEEE/RAS Intern’l Conf. on Humanoid Robots, pp. 561–566 (2008), doi:10.1109/ICHR.2008.4756031
Google Scholar
Nakajima, H., Nakadai, K., Hasegawa, Y., Tsujino, H.: Adaptive Step-Size Parameter Control for Real-World Blind Source Separation. In: IEEE Intern’l Conf. on Acoustics, Speech and Signal Processing, pp. 149–152 (2008)
Google Scholar
Nakajima, H., Nakadai, K., Hasegawa, Y., Tsujino, H.: High performance sound source separation adaptable to environmental changes for robot audition. In: IEEE/RSJ Intern’l Conf. on Intelligent Robots and Systems, pp.2165–2171 (2008)
Google Scholar
Nakajima, H., Nakadai, K., Hasegawa, Y., Tsujino, H.: Sound source separation of moving speakers for robot audition. In: IEEE Intern’l Conf. on Acoustics, Speech and Signal Processing, pp. 3685–3688 (2009)
Google Scholar
Nishiura, T., Yamada, T., Nakamura, S., Shikano, K.: Localization of multiple sound sources based on a CSP analysis with a microphone array. In: Proceeding of IEEE Intern’l Conf. on Acoustics, Speech and Signal Processing, Istanbul, Turkey, pp. 1053–1056 (2000)
Google Scholar
Nishimura, Y., Shinozaki, T., Iwano, K., Furui, S.: Noise-robust speech recognition using multi-band spectral features. In: 148th ASA Meeting, 1aSC7, ASA (2004)
Google Scholar
Okuno, H.G., Nakadai, K., Hidai, K., Mizoguchi, H., Kitano, H.: Human-Robot Non-Verbal Interaction Empowered by Real-Time Auditory and Visual Multiple-Talker Tracking. Advanced Robotics 17(2), 115–130 (2003), VSP and RSJ, doi:10.1163/156855303321165088
Article Google Scholar
Parra, L.C., Alvino, C.V.: Geometric source separation: Mergin convolutive source separation with geometric beamforming. IEEE Trans. on SAP 10(6), 352–362 (2002)
Google Scholar
Raj, H., Sterm, R.M.: Missing-feature approaches in speech recognition. IEEE Signal Processing Magazine 22(5),101–116 (2005)
Google Scholar
Rosenthal, D., Okuno, H.G.: Computational Auditory Scene Analysis. Lawrence Erlbaum Associates, Mahwah, New Jersey (1998)
Google Scholar
Schmidt, R.O.: Multiple Emitter Location and Signals Parameter Estimation. IEEE Transactions on Antennas and Propagation, AP-34, 276–280 (1986)
Google Scholar
Shah, J.K., Iyer, A.N., Smolenski, B.Y., Yantormo, R.E.: Robust Voiced/Unvoiced classification using novel feature and Gaussian Mixture Model. In: Proc. of IEEE Intern’l Conf. on Acoustics, Speech, and Signal Processing, Montreal, Canada (2004)
Google Scholar
Valin, J.-M., Michaud, F., Hadjou, B., Rouat, J.: Localization of simultaneous moving sound sources for mobile robot using a frequency-domain steered beamformer approach. In: IEEE Intern’l Conf. on Robotics and Automation, pp. 1033–1038 (2004)
Google Scholar
Valin, J.-M., Michaud, F., Hadjou, B., Rouat, J.: Enhanced Robot Audition Based on Microphone Array Source Separation with Post-Filter. In: IEEE/RSJ Intern’l Conf. on Intelligent Robots and Systems, pp.2123–2128 (2004)
Google Scholar
Valin, J.-M., Michaud, F., Hadjou, B., Rouat, J., Nakadai, K., Okuno, H.G.: Robust Recognition of Simultaneous Speech by a Mobile Robot. IEEE Transactions on Robotics 23(4), 742–752 (2007), doi:10.1109/TRO.2007.900612
Article Google Scholar
Valin, J.-M., Michaud, F., Hadjou, B., Rouat, J., Nakadai, K., Okuno, H.G.: Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering. Robotics and Autonomous Systems J. 55(3), 216–228 (2007)
Article Google Scholar
Yamada, S., Lee, A., Saruwatari, H., Shikano, K.: Unsupervided speaker adaptation based on HMM sufficient statistics in various noisy environments. In: Proc. of Eurospeech 2003. ESCA, pp. 1493–1496 (2003)
Google Scholar
Yamamoto, S., Valin, J.-M., Nakadai, K., Ogata, T., Okuno, H.G.: Enhanced Robot Speech Recognition Based on Microphone Array Source Separation and Missing Feature Theory. In: IEEE-RAS Intern’l Conf. on Robotics and Automation, pp. 1477–1482 (April 2005)
Google Scholar
Yamamoto, S., et al.: Making A Robot Recognize Three Simultaneous Sentences in Real-Time. In: IEEE/RSJ Intern’l Conf. on Intelligent Robots and Systems, pp. 4040–4045 (2005)
Google Scholar
Yamamoto, S., et al.: Real-time robot audition system that recognizes simultaneous speech in the real world. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5333–5338 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo, Kyoto, 606-8501, Japan
Hiroshi G. Okuno & Hyun-Don Kim
Honda Research Institute Japan Co. Ltd., 8-1 Honmachi, Wako, Saitama, 351-0188, Japan
Kazuhiro Nakadai
Tokyo Institute of Technology, Japan
Kazuhiro Nakadai

Authors

Hiroshi G. Okuno
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhiro Nakadai
View author publications
You can also search for this author in PubMed Google Scholar
Hyun-Don Kim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Robotics and Intelligent Systems Autonomous Systems Lab , ETH Zürich, Tannenstrasse 3, 8092, Zürich, Switzerland
Cédric Pradalier & Roland Siegwart &
Institute of Robotics and Mechatronics , German Aerospace Center (DLR), Münchner Strasse 20, 82234, Oberpfaffenhofen-Wessling, Germany
Gerhard Hirzinger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Okuno, H.G., Nakadai, K., Kim, HD. (2011). Robot Audition: Missing Feature Theory Approach and Active Audition. In: Pradalier, C., Siegwart, R., Hirzinger, G. (eds) Robotics Research. Springer Tracts in Advanced Robotics, vol 70. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19457-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-19457-3_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19456-6
Online ISBN: 978-3-642-19457-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics