Sound and Visual Tracking for Humanoid Robot

Okuno, Hiroshi G.; Nakadai, Kazuhiro; Lourens, Tino; Kitano, Hiroaki

doi:10.1007/3-540-45517-5_71

Hiroshi G. Okuno³,
Kazuhiro Nakadai³,
Tino Lourens⁴ &
…
Hiroaki Kitano⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2070))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

Abstract

Mobile robots with auditory perception usually adopt “stop- perceive-act” principle to avoid sounds made during moving due to motor noises or bumpy roads. Although this principle reduces the complexity of the problems involved auditory processing for mobile robots, it restricts their capabilities of auditory processing. In this paper, sound and visual tracking is investigated to attain robust object tracking by compensating each drawbacks in tracking objects. Visual tracking may be difficult in case of occlusion, while sound tracking may be ambiguous in localization due to the nature of auditory processing. For this purpose, we present an active audition system for a humanoid robot. The audition system of the intelligent humanoid requires localization of sound sources and identifica- tion of meanings of the sound in the auditory scene. The active audition reported in this paper focuses on improved sound source tracking by integrating audition, vision, and motor movements. Given the multiple sound sources in the auditory scene, SIG the humanoid actively moves its head to improve localization by aligning microphones orthogonal to the sound source and by capturing the possible sound sources by vision. The system adaptively cancels motor noise using motor control signals. The experimental result demonstrates the effectiveness and robustness of sound and visual tracking.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. Huang, N. Ohnishi, and N. Sugie: Building ears for robots: sound localization and separation, Artificial Life and Robotics, Vol. 1, No. 4 (1997) 157–163.
Article Google Scholar
Y. Matsusaka, T. Tojo, S. Kuota, K. Furukawa, D. Tamiya, K. Hayata, Y. Nakano, and T. Kobayashi: Multi-person conversation via multi-modal interface — a robot who communicates with multi-user, Proceedings of 6th European Conference on Speech Communication Technology (EUROSPEECH-99), 1723–1726, ESCA, 1999.
Google Scholar
A. Takanishi, S. Masukawa, Y. Mori, and T. Ogawa: Development of an anthropomorphic auditory robot that localizes a sound direction (in japanese), Bulletin of the Centre for Informatics, Vol. 20 (1995) 24–32.
Google Scholar
G. J. Brown: Computational auditory scene analysis: A representational approach. University of Sheffield, 1992.
Google Scholar
M. P. Cooke, G. J. Brown, M. Crawford, and P. Green: Computational auditory scene analysis: Listening to several things at once, Endeavour, Vol. 17, No. 4 (1993) 186–190.
Article Google Scholar
T. Nakatani, H. G. Okuno, and T. Kawabata: Auditory stream segregation in auditory scene analysis with a multi-agent system, Proceedings of 12th National Conference on Artificial Intelligence (AAAI-94), 100–107, AAAI, 1994.
Google Scholar
D. Rosenthal and H. G. Okuno (eds.): Computational Auditory Scene Analysis. Mahwah, New Jersey: Lawrence Erlbaum Associates, 1998.
Google Scholar
K. Nakadai, T. Lourens, H. G. Okuno, and H. Kitano: Active audition for humanoid, Proceedings of 17th National Conference on Artificial Intelligence (AAAI-2000), 832–839, AAAI, 2000.
Google Scholar
T. Nakatani, H. G. Okuno, and T. Kawabata: Residue-driven architecture for computational auditory scene analysis, Proceedings of 14th International Joint Conference on Artificial Intelligence (IJCAI-95), vol. 1, 165–172, AAAI, 1995.
Google Scholar
S. Cavaco and J. Hallam: A biologically plausible acoustic azimuth estimation system, Proceedings of IJCAI-99 Workshop on Computational Auditory Scene Analysis (CASA’99), 78–87, IJCAI, 1999.
Google Scholar
Y. Nakagawa, H. G. Okuno, and H. Kitano: Using vision to improve sound source separation, Proceedings of 16th National Conference on Artificial Intelligence (AAAI-99), 768–775, AAAI, 1999.
Google Scholar
H. Kitano, H. G. Okuno, K. Nakadai, I. Fermin, T. Sabish, Y. Nakagawa, and T. Matsui: Designing a humanoid head for robocup challenge, Proceedings of the Fourth International Conference on Autonomous Agents (Agents 2000), ACM, 2000.
Google Scholar
K. Nakadai, T. Lourens, H. G. Okuno, and H. Kitano: Humanoid active audition system improved by the cover acoustics, PRICAI-2000 Topics in Artificial Intelligence (Sixth Pacific Rim International Conference on Artificial Intelligence), Lecture Notes in Computer Science, No. 1886, 544–554, Springer Verlag, 2000.
Google Scholar
T. Lourens, K. Nakadai, H. G. Okuno, and H. Kitano: Selective attention by integration of vision and audition, Proceedings of First IEEE-RAS International Conference on Humanoid Robot (Humanoid-2000), IEEE/RSJ, 2000.
Google Scholar
R. Brooks, C. Breazeal, M. Marjanovie, B. Scassellati, and M. Williamson: The cog project: Building a humanoid robot, Computation for metaphors, analogy, and agents (C. Nehaniv, ed.), 52–87, Spriver-Verlag, 1999.
Google Scholar
K. Nakadai, K. Hidai, H. Mizoguchi, H. G. Okuno, and H. Kitano: Real-time auditory and visual multiple-object tracking for robots, submitted, 2001.
Google Scholar

Download references

Author information

Authors and Affiliations

ERATO, Japan Science and Technolog Corp, Kitano Symbiotic Systems Project, Mansion 31 Suite 6A 6-31-15 Jingumae, Shibuya Tokyo, 150-0001, Japan
Hiroshi G. Okuno & Kazuhiro Nakadai
Science University of Tokyo, Noda Chiba, 278-8510, Japan
Tino Lourens
Sony Computer Science Laboratories, Inc., Shinagawa, Tokyo, 141-0022
Hiroaki Kitano

Authors

Hiroshi G. Okuno
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhiro Nakadai
View author publications
You can also search for this author in PubMed Google Scholar
Tino Lourens
View author publications
You can also search for this author in PubMed Google Scholar
Hiroaki Kitano
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Hungarian Academy of Sciences, Intelligent Manufacturing and Business Processes Computer and Automation Research Institute, Kende utca 13-17, 1111, Budapest, Hungary
László Monostori & József Váncza &
Department of Computer Science 601 University Drive, Southwest Texas State University, San Marcos, TX, 78666-4616, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Okuno, H.G., Nakadai, K., Lourens, T., Kitano, H. (2001). Sound and Visual Tracking for Humanoid Robot. In: Monostori, L., Váncza, J., Ali, M. (eds) Engineering of Intelligent Systems. IEA/AIE 2001. Lecture Notes in Computer Science(), vol 2070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45517-5_71

Download citation

DOI: https://doi.org/10.1007/3-540-45517-5_71
Published: 18 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42219-8
Online ISBN: 978-3-540-45517-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics