Sound and Visual Tracking for Humanoid Robot

Okuno, Hiroshi G.; Nakadai, Kazuhiro; Lourens, Tino; Kitano, Hiroaki

doi:10.1023/B:APIN.0000021417.62541.e0

Sound and Visual Tracking for Humanoid Robot

Published: May 2004

Volume 20, pages 253–266, (2004)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Hiroshi G. Okuno,
Kazuhiro Nakadai,
Tino Lourens &
…
Hiroaki Kitano

201 Accesses
14 Citations
Explore all metrics

Abstract

Mobile robots capable of auditory perception usually adopt the “stop-perceive-act” principle to avoid sounds made during moving due to motor noise. Although this principle reduces the complexity of the problems involved in auditory processing for mobile robots, it restricts their capabilities of auditory processing. In this paper, sound and visual tracking are investigated to compensate each other's drawbacks in tracking objects and to attain robust object tracking. Visual tracking may be difficult in case of occlusion, while sound tracking may be ambiguous in localization due to the nature of auditory processing. For this purpose, we present an active audition system for humanoid robot. The audition system of the highly intelligent humanoid requires localization of sound sources and identification of meanings of the sound in the auditory scene. The active audition reported in this paper focuses on improved sound source tracking by integrating audition, vision, and motor control. Given the multiple sound sources in the auditory scene, SIG the humanoid actively moves its head to improve localization by aligning microphones orthogonal to the sound source and by capturing the possible sound sources by vision. The system adaptively cancels motor noises using motor control signals. The experimental result demonstrates the effectiveness of sound and visual tracking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Y. Aloimonos, I.Weiss, and A. Bandyopadhyay, "Active vision," International Journal of Computer Vision, vol. 1, no. 4, pp. 333-356, 1987.
Google Scholar
S.F. Boll, "A spectral subtraction algorithm for suppression of acoustic noise in speech," in Proceedings of 1979 International Conference on Acoustics, Speech, and Signal Processing (ICASSP-79), 1979, pp. 200-203.
R. Brooks, C. Breazeal, M. Marjanovie, B. Scassellati, and M. Williamson, "The Cog project: Building a humanoid robot," Technical report, MIT, 1999.
R. Brooks, C. Breazeal, M. Marjanovie, B. Scassellati, and M. Williamson, "The Cog project: Building a humanoid robot," in Computation for Metaphors, Analogy, and Agents, edited by C. Nehaniv, 1999, pp. 52-87.
G.J. Brown, Computational Auditory Scene Analysis: A Representational Approach, University of Sheffield, 1992.
M.P. Cooke, G.J. Brown, M. Crawford, and P. Green: "Computational auditory scene analysis: Listening to several things at once," Endeavour, vol. 17, no. 4, pp. 186-190, 1993.
Google Scholar
O.D. Faugeras, Three Dimensional Computer Vision: A Geometric Viewpoint, The MIT Press: MA., 1993.
Google Scholar
J. Huang, N. Ohnishi, and N. Sugie, "Building ears for robots: Sound localization and separation," Artificial Life and Robotics, vol. 1, no. 4, pp. 157-163, 1997.
Google Scholar
R.E. Irie, "Multimodal sensory integration for localization in a humanoid robot," in Proceedings of the Second IJCAI Workshop on Computational Auditory Scene Analysis (CASA'97), 1997, pp. 54-58.
H. Kitano, H.G. Okuno, K. Nakadai, I. Fermin, T. Sabish, Y. Nakagawa, and T. Matsui, "Designing a humanoid head for RoboCup challenge," in Proceedings of the Fourth International Conference on Autonomous Agents (Agents 2000), 2000, pp. 17-18.
T. Lourens, K. Nakadai, H.G. Okuno, and H. Kitano, "Humanoid Active Audition System," in Proceedings of First IEEE-RAS International Conference on Humanoid Robot (Humanoids 2000), 2000.
Y. Matsusaka, T. Tojo, S. Kuota, K. Furukawa, D. Tamiya, K. Hayata, Y. Nakano, and T. Kobayashi, "Multi-person conversation via multi-modal interface-A robot who communicates with multi-user," in Proceedings of 6th European Conference on Speech Communication Technology (EUROSPEECH-99), 1999, pp. 1723-1726.
K. Nakadai, T. Lourens, H.G. Okuno, and H. Kitano, "Active audition for humanoid," in Proceedings of 17th National Conference on Artificial Intelligence (AAAI-2000), 2000, pp. 832-839.
K. Nakadai, T. Lourens, H.G. Okuno, and H. Kitano, "Humanoid active audition system improved by the cover acoustics," in PRICAI-2000 Topics in Artificial Intelligence (Sixth Pacific Rim International Conference on Artificial Intelligence), 2000, pp. 544-554.
Y. Nakagawa, H.G. Okuno, and H. Kitano, "Using vision to improve sound source separation," in Proceedings of 16th National Conference on Artificial Intelligence (AAAI-99), 1999, pp. 768-775.
T. Nakatani, H.G. Okuno, and T. Kawabata, "Auditory stream segregation in auditory scene analysis with a multi-agent system," in Proceedings of 12th National Conference on Artificial Intelligence (AAAI-94), 1994, pp. 100-107.
T. Nakatani, H.G. Okuno, and T. Kawabata, "Residue-driven architecture for computational auditory scene analysis," in Proceedings of 14th International Joint Conference on Artificial Intelligence (IJCAI-95), 1995, vol. 1, pp. 165-172.
Google Scholar
H.G. Okuno, K. Nakadai, T. Lourens, and H. Kitano, "Separating three simultaneous speeches with two microphones by integrating auditory and visual processing," in Proceedings of International Conference on Speech Processing (Eurospeech 2001), Sept. 2001, pp. 2643-2646.
H.G. Okuno, T. Nakatani, and T. Kawabata, "Listening to two simultaneous speeches," Speech Communication, vol. 27, no. 3-4, pp. 281-298, 1999.
Google Scholar
D. Rosenthal and H.G. Okuno (Eds.), Computational Auditory Scene Analysis. Lawrence Erlbaum Associates: Mahwah, New Jersey, 1998.
Google Scholar
M. Slaney, D. Naar, and R.F. Lyon, "Auditory model inversion for sound separation," in Proceedings of 1994 International Conference on Acoustics, Speech, and Signal Processing, 1994, vol. 2. pp. 77-80.
Google Scholar
A. Takanishi, S. Masukawa, Y. Mori, and T. Ogawa, "Development of an anthropomorphic auditory robot that localizes a sound direction (in Japanese)," Bulletin of the Centre for Informatics, vol. 20, pp. 24-32, 1995.
Google Scholar

Download references

Authors

Hiroshi G. Okuno
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhiro Nakadai
View author publications
You can also search for this author in PubMed Google Scholar
Tino Lourens
View author publications
You can also search for this author in PubMed Google Scholar
Hiroaki Kitano
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Okuno, H.G., Nakadai, K., Lourens, T. et al. Sound and Visual Tracking for Humanoid Robot. Applied Intelligence 20, 253–266 (2004). https://doi.org/10.1023/B:APIN.0000021417.62541.e0

Download citation

Issue Date: May 2004
DOI: https://doi.org/10.1023/B:APIN.0000021417.62541.e0

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sound and Visual Tracking for Humanoid Robot

Abstract

Access this article

Similar content being viewed by others

Practical Robotic Auditory Perception and Approaching Methods Based on Small-sized Microphone Array

Selection of the Closest Sound Source for Robot Auditory Attention in Multi-source Scenarios

Binaural Systems in Robotics

References

Rights and permissions

About this article

Cite this article

Navigation

Sound and Visual Tracking for Humanoid Robot

Abstract

Access this article

Similar content being viewed by others

Practical Robotic Auditory Perception and Approaching Methods Based on Small-sized Microphone Array

Selection of the Closest Sound Source for Robot Auditory Attention in Multi-source Scenarios

Binaural Systems in Robotics

References

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation