Skip to main content
Log in

Multi-Speaker Dialogue for Vehicular Navigation and Assistance

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Currently, most Spoken Dialogue Systems (SDS) deal only with the interaction between the system and one speaker. In some situations, interaction may occur among several speakers and the system. New functions and improvements need to be made in order to handle a multi-user situation. Studies of the human-computer interaction system that involve multiple users are in their initial stages, and any papers, lectures or studies on the subject are very limited. For these reasons we are motivated to conduct a further study on Multi-Speaker Dialogue Systems (MSDS).

In this paper, the interactions between the multiple speakers and the system are classified into three types: independent, cooperative, and conflicting interactions. An algorithm for the multi-speaker dialogue management is proposed. The algorithm determines the interaction type, integrates conversation goals for multiple speakers and keeps the interaction going smoothly. The experiments were carried out in a practical system which can provide useful vehicular information to drivers and passengers. Experimental results show that the proposed algorithm can handle the interaction that occurs in a multi-speaker dialogue system properly. The task completion rate is 76%, and more than 61% of the testers are satisfied with the proposed MSDS system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abney, S. (1996). Part of Speech Tagging and Partial Parsing, Corpus-Based Methods in Language and Speech Processing. Dordrecht: Kluwer Academic, pp. 118–136.

    Google Scholar 

  • Aust, H., Oerder, M., Seide, F., and Steinbiss, V. (1995). A spoken language inquiry system for automatic train timetable information. Philips Journal of Research, 49(4):399–418.

    Google Scholar 

  • Berg, J. and Francez, N. (1994). A multi-agent extension of DRT. Technical report of Laboratory for Computation Linguistics. Proceeding of the 1st International Workshop on Computational Semantics. University of Tilburg, pp. 81-90.

  • Boyce, S.J. (2000). Natural spoken dialogue systems for telephony applications. Communication of the ACM, 43(9):29–34.

    Google Scholar 

  • Bull, M. and Aylett, M. (1998). An analysis of the timing of turn-taking in a corpus of goal-oriented dialogue. Proceedings of the International Conference on Spoken Language Processing (ICSLP'98). Sydney, Australia, vol. 4, pp. 1175–1178.

    Google Scholar 

  • Chen, S.S. and Gopalakrishnan, P.S. (1998). Speaker, environment and channel change detection and clustering via the bayesian information criterion. Proc. DARPA Broadcast News Transcription and Understanding Workshop.

  • Chu, C.J. and Carpenter, R. (1999), Vector-based natural language call routing. Journal of Computational Linguistics, 25(30):361–388.

    Google Scholar 

  • Cohen, P.R., Coulston, R., and Krout, K. (2002). Multiparty multimodal interaction: A preliminary analysis. Proceeding of International Conference on Spoken Language Processing (ICSLP'2002), CD-ROM.

  • Danieli, M. and Gerbino, E. (1995). Metrics for evaluating dialogue strategies in a spoken language system. Proceedings of the 1995 AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation. Stanford, CA, pp. 34-39.

  • Foote, J.T., Young, S.J., Jones, G.J.F., and Jones, K.S. (1997). Unconstrained keyword spotting using phone lattices with application to spoken document retrieval. Computer Speech and Language, 11:207–224.

    Google Scholar 

  • Gorin, A.L., Riccardi, G., and Wright, J.H. (1997). How may I help you. Speech Communication, 23:113–127.

    Google Scholar 

  • Haykin, S. (1989). Modern Filters. Macmillan Publishing Company.

  • Hinkelman, E.A. and Spaceman, S.K. (1994). Communication with multiple agents. Proceedings of the 15th International Conference on Computational Linguistics (COLING'94). Kyoto, Japan, vol. 2, pp. 1191–1197.

    Google Scholar 

  • Hirschman, L. and Pao, C. (1993). The cost of errors in a spoken language system. Proceedings of the Third European Conference on Speech Communication and Technology. Berlin, Germany, pp. 1419-1422.

  • Huang, X., Acero, A., and Hon, H.W. (2001). Spoken Language Processing. New Jersey: Prentice Hall.

    Google Scholar 

  • Jackendoff, R.S. (1977). X's Syntax: A Study of Phrase Structure, Cambridge: MIT Press.

    Google Scholar 

  • Johnston, M., Bangalore, S., Stent, A. Vasireddy, G., and Ehlen, P. (2002). Multimodal language processing for mobile information access. Proceeding of International Conference on Spoken Language Processing (ICSLP'2002), CD-ROM.

  • Knill, K.M. and Young, S.J. (1999). Low-cost implementation of open set keyword spotting. Computer Speech and Language, 13:243–266.

    Google Scholar 

  • Kwon, S. and Narayanan, S. (2002). Speaker change detection using a new weighted distance measure. Proceeding of International Conference Spoken Language Processing, 4:2537–2540.

    Google Scholar 

  • Leggetter, C.J. and Woodland, P.C. (1995) Flexible speaker adaptation using maximum likelihood linear regression. Proceeding of Eurospeech'95, pp. 1155-1158.

  • Marsic, I. (2000). Natural communication with information systems. Proceedings of the IEEE, 88:1354–1366.

    Google Scholar 

  • Matsusaka, Y., Tojo, T., Kubota, S., Furukawa, K., Tamiya, D., Hayata, K., Nakano, Y., and Kobayashi, T. (1999). Multi-person conversation via multi-modal interface-A Robot who communicate with multi-user. Proceeding of EuroSpeech'99, pp. 1723-1726.

  • Poesio, M. (1998). Cross-speaker anaphora and dialogue acts. Proceeding of theWorkshop on Mutual Knowledge, Common Ground and Public Information ESSLLI Summer School.

  • Reinhart, T. (1983). Anaphora and Semantic Interpretation. Croom Helm Linguistics Series. University of Chicago Press.

  • Riccardi, G., Gorin, A.L., Ljolje, A., and Riley, M. (1997). A spoken language system for automated call routing. Proceeding of ICASSP'97, pp. 1143-1146.

  • R¨ossler, H., Wajda, J.S., Hoffmann, J., and Kostrzewa, M. (2001). Multimodal interaction for mobile environments. Proceeding of International Workshop on Information Presentation and Natural Multimodal Dialogue.

  • Sanderman, A., Sturm, J., den Os, E., Boves, L., and Cremers, A. (1998). Evaluation of the dutchtrain timetable information system developed in the ARISE project. Interactive Voice Technology for Telecommunications Applications, IVTTA, pp. 91-96.

  • Seneff, S., Lau, R., and Polifroni, J. (1999). Organization, communication, and control in the GALAXY-II conversational system. Proc. Eurospeech '99, pp. 1271-1274.

  • Shankar, T.R., VanKleek, M., Vicente, A., and Smith, B.K. (2002). Fugue: A computer mediated conversational system that supports turn negotiation. 33rd Hawaii International Conference on System Sciences. Los Alamitos: IEEE Press.

    Google Scholar 

  • S¨onmez, K., Heck, L., and Weintraub, M. (1999). Speaker tracking and detection with multiple speakers. Proceeding of Eurospeech'99. Budapest, vol. 5, pp. 2219–2222.

    Google Scholar 

  • Traum, D. and Rickel, J. (2002). Embodied agents for multi-party dialogue in immersive virtual worlds. Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 2, pp. 766-773.

  • Vaseghi, S.V. (2000). Advanced Digital Signal Processing and Noise Reduction. John Wiley & Sons.

  • Walker, M.A., Litman, D.J., Kamn, C.A., and Abella, A. (1998). Evaluating spoken dialogue agents with PARADISE: Two case studies. Computer Speech and Language, 12:317–347.

    Google Scholar 

  • Wang, K. (2000). A plan-based dialog system with probabilistic inference. Proceeding of International Conference on Spoken Language Processing, ICSLP'2000.

  • Wang, Y. (1999). A robust parser for spoken language understanding. Proceeding of Eurospeech'99. Budapest, Hungary, pp. 2055-2058.

  • Wu, C.H. and Chen,Y.J. (2001). Multi-keyword spotting of telephone speech using a fuzzy search algorithm and keyword-driven twolevel cbsm. Speech Communication, 33:197–212.

    Google Scholar 

  • Young, S.R. (1995). Discourse structure for multi-speaker spontaneous spoken dialogs: Incorporating heuristics into stochastic RTNS. Proceeding of International Conference on Acoustic and Speech Signal Processing (ICASSP'95), pp. 177-180.

  • Young, S.J. (2000). Probabilistic methods in spoken dialogue systems. Philosophical Transactions of the Royal Society (Series A), 358(1769):1389–1402.

    Google Scholar 

  • Young, S.J. (2002). Talking to machines (statistically speaking). Proceeding of International Conference on Spoken Language Processing. Denver, Colorado.

  • Zue, V., Seneff, S., Glass, J.R., Polifroni, J., Pao, C., Hazen, T.J., and Hetherington, L. (2000). Jupiter: A telephone-based conversational interface for weather information. IEEE Transactions on Speech and Audio Processing, 8(1):85–96.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, HC., Wang, JF. Multi-Speaker Dialogue for Vehicular Navigation and Assistance. International Journal of Speech Technology 7, 231–244 (2004). https://doi.org/10.1023/B:IJST.0000017018.78522.58

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:IJST.0000017018.78522.58

Navigation