Integration of Multiple Sound Source Localization Results for Speaker Identification in Multiparty Dialogue System

Nakashima, Taichi; Komatani, Kazunori; Sato, Satoshi

doi:10.1007/978-1-4614-8280-2_14

Taichi Nakashima⁵,
Kazunori Komatani⁵ &
Satoshi Sato⁵

1526 Accesses
1 Citations

Abstract

Humanoid robots need to head toward human participants when answering to their questions in multiparty dialogues. Some positions of participants are difficult to localize from robots in multiparty situations, especially when the robots can only use their own sensors. We present a method for identifying the speaker more accurately by integrating the multiple sound source localization results obtained from two robots: one talking mainly with participants and the other also joining the conversation when necessary. We place them so that they can compensate for each other’s localization capabilities and then integrate their two results. Our experimental evaluation revealed that using two robots improved speaker identification compared with using only one robot. We furthermore implemented our method into humanoid robots and constructed a demo system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bennewitz, M., Faber, F., Joho, D., Schreiber, M., Behnke, S.: Integrating vision and speech for conversations with multiple persons. In: Proceedings of IEEE/RSJ the International Conference on Intelligent Robots and Systems (IROS), pp. 2523–2528 (2005). doi: 10.1109/IROS.2005.1545158
Google Scholar
Bohus, D., Horvitz, E.: Models for multiparty engagement in open-world dialog. In: Proceedings of the SIGDIAL 2009 Conference, pp. 225–234 (2009)
Google Scholar
Gruenstein, A., Seneff, S.: Releasing a multimodal dialogue system into the wild: User support mechanisms. In: Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue, pp. 111–119 (2007)
Google Scholar
Haider, F., Moubayed, S.A.: Towards speaker detection using lips movements for human-machine multiparty dialogue. In: FONETIK 2012 (2012)
Google Scholar
Jovanovic, N., op den Akker, R., Nijholt, A.: Addressee identification in face-to-face meetings. In: Proceedings of the 11th Conference of the EACL (2006)
Google Scholar
Matsuyama, Y., Taniyama, H., Fujie, S., Kobayashi, T.: Framework of communication activation robot participating in multiparty conversation. In: Proceedings of AAAI Fall Symposium, Dialog with Robots, pp. 68–73 (2010)
Google Scholar
Moubayed, S.A., Beskow, J., Blomberg, M., Granström, B., Gustafson, J., Mirnig, N., Skantze, G.: Talking with furhat - multi-party interaction with a back-projected robot head. In: FONETIK 2012 (2012)
Google Scholar
Mutlu, B., Shiwa, T., Kanda, T., Ishiguro, H., Hagita, N.: Footing in human-robot conversations: how robots might shape participant roles using gaze cues. In: Proceedings of the 4th ACM/IEEE International Conference on Human Robot Interaction, pp. 61–68 (2009)
Google Scholar
Nakadai, K., Takahashi, T., Okuno, H.G., Nakajima, H., Hasegawa, Y., Tsujino, H.: Design and implementation of robot audition system ‘HARK’ - open source software for listening to three simultaneous speakers. Adv. Robotics 5, 739–761 (2010)
Article Google Scholar
Schmidt, R.O.: Multiple emitter location and signal parameter estimation. IEEE Trans. Antenn. Propagat. 34, 276–280 (1986). doi: 10.1109/TAP.1986.1143830
Article Google Scholar
Stiefelhagen, R., Yang, J., Waibel, A.: Modeling focus of attention for meeting indexing based on multiple cues. IEEE Trans. Neural Networ. 13, 928–938 (2002)
Article Google Scholar
Traum, D.: Issues in multi-party dialogues. In: Advances in Agent Communication. Lecture Notes in Artificial Intelligence, vol. 2922, pp. 201–211. Springer, Berlin (2004)
Google Scholar
Traum, D., Rickel, J.: Embodied agents for multi-party dialogue in immersive virtual worlds. In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 766–773 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Engineering, Nagoya University, Nagoya, Japan
Taichi Nakashima, Kazunori Komatani & Satoshi Sato

Authors

Taichi Nakashima
View author publications
You can also search for this author in PubMed Google Scholar
Kazunori Komatani
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Sato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Taichi Nakashima .

Editor information

Editors and Affiliations

IMMI-CNRS, Orsay, France
Joseph Mariani
LIMSI-CNRS, Orsay, France
Sophie Rosset
IMMI-CNRS, Orsay, France
Martine Garnier-Rizet
LIMSI-CNRS, Orsay, France
Laurence Devillers

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nakashima, T., Komatani, K., Sato, S. (2014). Integration of Multiple Sound Source Localization Results for Speaker Identification in Multiparty Dialogue System. In: Mariani, J., Rosset, S., Garnier-Rizet, M., Devillers, L. (eds) Natural Interaction with Robots, Knowbots and Smartphones. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8280-2_14

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8280-2_14
Published: 28 August 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8279-6
Online ISBN: 978-1-4614-8280-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics