Abstract
Social interaction is essential in improving robot human interface. Such behaviors for social interaction may include paying attention to a new sound source, moving toward it, or keeping face to face with a moving speaker. Some sound-centered behaviors may be difficult to attain, because the mixture of sounds is not well treated or auditory processing is too slow for real-time applications. Recently, Nakadai et al have developed real-time auditory and visual multiple-talker tracking technology by associating auditory and visual streams. The system is implemented on an upper-torso humanoid and the real-time talker tracking is attained with 200 msec of delay by distributed processing on four PCs connected by Gigabit Ethernet. Focus-of-attention is programmable and allows a variety of behaviors. The system demonstrates non-verbal social interaction by realizing a receptionist robot by focusing on an associated stream, while a companion robot on an auditory stream.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Breazeal, C., AND Scassellati, B. A context-dependent attention system for a social robot. Proceedints of the Sixteenth International Joint Conf. on Atificial Intelligence (IJCAI-99), 1146–1151.
Breazeal, C. Emotive qualities in robot speech. Proc. of IEEE/RSJ International Conf. on Intelligent Robots and Systems (IROS-2001), 1389–1394.
Brooks, R. A., Breazeal, C., Irie, R., Kemp, C. C., Marjanovic, M., Scassellati, B., AND Williamson, M. M. Alternative essences of intelligence. Proc. of 15th National Conf. on Artificial Intelligence (AAAI-98), 961–968.
Horvitz, E., AND Paek, T. A computational architecture for conversation. Proc. of Seventh International Conf. on User Modeling (1999), Springer, 201–210.
Kagami, S., Okada, K., Inaba, M., AND Inoue, H. Real-time 3d optical flow generation system. Proc. of International Conf. on Multisensor Fusion and Integration for Intelligent Systems (MFI’99), 237–242.
Kawahara, T., Lee, A., Kobayashi, T., Takeda, K., Minematsu, N., Itou, K., Ito, A., Yamamoto, M., Yamada, A., Utsuro, T., AND Shikano, K. Japanese dictation toolkit-1997 version-. Journal of Acoustic Society Japan (E)20, 3 (1999), 233–239.
Matsusaka, Y., Tojo, T., Kuota, S., Furukawa, K., Tamiya, D., Hayata, K., Nakano, Y., AND Kobayashi, T. Multi-person conversation via multi-modal interface — a robot who communicates with multi-user. Proc. of 6th European Conf. on Speech Communication Technology (EUROSPEECH-99), ESCA, 1723–1726.
Nupakadai, K., Lourens, T., Okuno, H. G., AND Kitano, H. Active audition for humanoid. Proc. of 17th National Conf. on Artificial Intelligence (AAAI-2000), 832–839.
Nakadai, K., Matsui, T., Okuno, H. G., AND Kitano, H. Active audition system and humanoid exterior design. Proc. of IEEE/RAS International Conf. on Intelligent Robots and Systems (IROS-2000), 1453–1461.
Nakadai, K. Hidai, K., Mizoguchi, H., Okuno, H. G., AND Kitano, H. Realtime auditory and visual multiple-object tracking for robots. Proc. of the Seventeenth International Joint Conf. on Artificial Intelligence (IJCAI-01), 1425–1432.
Okuno, H., Nakadai, K., Lourens, T., AND Kitano, H. Sound and visual tracking for humanoid robot. Proc. of Seventeenth International Conf. on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems (IEA/AIE-2001) (Jun. 2001), LNAI 2070, Springer-Verlag, 640–650.
Okuno, H. G., Nakatani, T., AND Kawabata, T. Listening to two simultaneous speeches. Speech Communication 27, 3–4 (1999), 281–298.
Ono, T., Imai, M., AND Ishiguro, H. A model of embodied communications with gestures between humans and robots. Proc. of Twenty-third Annual Meeting of the Cognitive Science Society (CogSci2001), AAAI, 732–737.
Waldherr, S., Thrun, S., Romero, R., AND Margaritis, D. Template-based recoginition of pose and motion gestures on a mobile robot. Proc. of 15th National Conf. on Artificial Intelligence (AAAI-98), 977–982.
Wolfe, J., Cave, K. R., AND Franzel, S. Guided search: An alternative to the feature integration model for visual search. Journal of Experimental Psychology: Human Perception and Performance 15, 3 (1989), 419–433.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Okuno, H.G., Nakadai, K., Kitano, H. (2002). Social Interaction of Humanoid Robot Based on Audio-Visual Tracking. In: Hendtlass, T., Ali, M. (eds) Developments in Applied Artificial Intelligence. IEA/AIE 2002. Lecture Notes in Computer Science(), vol 2358. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48035-8_70
Download citation
DOI: https://doi.org/10.1007/3-540-48035-8_70
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43781-9
Online ISBN: 978-3-540-48035-8
eBook Packages: Springer Book Archive