Skip to main content

Integration of Multiple Sound Source Localization Results for Speaker Identification in Multiparty Dialogue System

  • Conference paper
  • First Online:
Natural Interaction with Robots, Knowbots and Smartphones

Abstract

Humanoid robots need to head toward human participants when answering to their questions in multiparty dialogues. Some positions of participants are difficult to localize from robots in multiparty situations, especially when the robots can only use their own sensors. We present a method for identifying the speaker more accurately by integrating the multiple sound source localization results obtained from two robots: one talking mainly with participants and the other also joining the conversation when necessary. We place them so that they can compensate for each other’s localization capabilities and then integrate their two results. Our experimental evaluation revealed that using two robots improved speaker identification compared with using only one robot. We furthermore implemented our method into humanoid robots and constructed a demo system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.aldebaran-robotics.com/en/.

  2. 2.

    http://sslab.nuee.nagoya-u.ac.jp/en/?page_id=112.

References

  1. Bennewitz, M., Faber, F., Joho, D., Schreiber, M., Behnke, S.: Integrating vision and speech for conversations with multiple persons. In: Proceedings of IEEE/RSJ the International Conference on Intelligent Robots and Systems (IROS), pp. 2523–2528 (2005). doi: 10.1109/IROS.2005.1545158

    Google Scholar 

  2. Bohus, D., Horvitz, E.: Models for multiparty engagement in open-world dialog. In: Proceedings of the SIGDIAL 2009 Conference, pp. 225–234 (2009)

    Google Scholar 

  3. Gruenstein, A., Seneff, S.: Releasing a multimodal dialogue system into the wild: User support mechanisms. In: Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue, pp. 111–119 (2007)

    Google Scholar 

  4. Haider, F., Moubayed, S.A.: Towards speaker detection using lips movements for human-machine multiparty dialogue. In: FONETIK 2012 (2012)

    Google Scholar 

  5. Jovanovic, N., op den Akker, R., Nijholt, A.: Addressee identification in face-to-face meetings. In: Proceedings of the 11th Conference of the EACL (2006)

    Google Scholar 

  6. Matsuyama, Y., Taniyama, H., Fujie, S., Kobayashi, T.: Framework of communication activation robot participating in multiparty conversation. In: Proceedings of AAAI Fall Symposium, Dialog with Robots, pp. 68–73 (2010)

    Google Scholar 

  7. Moubayed, S.A., Beskow, J., Blomberg, M., Granström, B., Gustafson, J., Mirnig, N., Skantze, G.: Talking with furhat - multi-party interaction with a back-projected robot head. In: FONETIK 2012 (2012)

    Google Scholar 

  8. Mutlu, B., Shiwa, T., Kanda, T., Ishiguro, H., Hagita, N.: Footing in human-robot conversations: how robots might shape participant roles using gaze cues. In: Proceedings of the 4th ACM/IEEE International Conference on Human Robot Interaction, pp. 61–68 (2009)

    Google Scholar 

  9. Nakadai, K., Takahashi, T., Okuno, H.G., Nakajima, H., Hasegawa, Y., Tsujino, H.: Design and implementation of robot audition system ‘HARK’ - open source software for listening to three simultaneous speakers. Adv. Robotics 5, 739–761 (2010)

    Article  Google Scholar 

  10. Schmidt, R.O.: Multiple emitter location and signal parameter estimation. IEEE Trans. Antenn. Propagat. 34, 276–280 (1986). doi: 10.1109/TAP.1986.1143830

    Article  Google Scholar 

  11. Stiefelhagen, R., Yang, J., Waibel, A.: Modeling focus of attention for meeting indexing based on multiple cues. IEEE Trans. Neural Networ. 13, 928–938 (2002)

    Article  Google Scholar 

  12. Traum, D.: Issues in multi-party dialogues. In: Advances in Agent Communication. Lecture Notes in Artificial Intelligence, vol. 2922, pp. 201–211. Springer, Berlin (2004)

    Google Scholar 

  13. Traum, D., Rickel, J.: Embodied agents for multi-party dialogue in immersive virtual worlds. In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 766–773 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Taichi Nakashima .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this paper

Cite this paper

Nakashima, T., Komatani, K., Sato, S. (2014). Integration of Multiple Sound Source Localization Results for Speaker Identification in Multiparty Dialogue System. In: Mariani, J., Rosset, S., Garnier-Rizet, M., Devillers, L. (eds) Natural Interaction with Robots, Knowbots and Smartphones. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8280-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-8280-2_14

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-8279-6

  • Online ISBN: 978-1-4614-8280-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics