Hostname: page-component-7c8c6479df-ph5wq Total loading time: 0 Render date: 2024-03-28T10:31:06.329Z Has data issue: false hasContentIssue false

Recent research advances in Reinforcement Learning in Spoken Dialogue Systems

Published online by Cambridge University Press:  01 December 2009

Matthew Frampton*
Affiliation:
Center for the Study of Language and Information, Stanford University, Stanford, CA 94305-4101, USA; e-mail: frampton@stanford.edu
Oliver Lemon*
Affiliation:
School of Mathematical and Computer Sciences, Heriot Watt University, Edinburgh EH14 4AS, UK; e-mail: o.lemon@hw.ac.uk

Abstract

This paper will summarize and analyze the work of the different research groups who have recently made significant contributions in using Reinforcement Learning techniques to learn dialogue strategies for Spoken Dialogue Systems (SDSs). This use of stochastic planning and learning has become an important research area in the past 10 years, since it promises automatic data-driven optimization of the behavior of SDSs that were previously hand-coded by expert developers. We survey the most important developments in the field, compare and contrast the different approaches, and describe current open problems.

Type
Articles
Copyright
Copyright © Cambridge University Press 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bohus, D., Rudnicky, A. 2005. Sorry, I didn’t catch that!—an investigation of non-understanding errors and recovery strategies. In Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue, Dybkjaer, L. & Minker, W. (eds). Lisbon, Portugal, 128143.Google Scholar
Bos, J., Klein, E., Lemon, O., Oka, T. 2003. DIPPER: description and formalisation of an information-state update dialogue system architecture. In Proceedings of the 4th SIGdial Workshop on Discourse and Dialogue, 115124.Google Scholar
Cheyer, A., Martin, D. 2001. The open agent architecture. Journal of Autonomous Agents and Multi-Agent Systems 4(1/2), 143148.CrossRefGoogle Scholar
Chickering, D., Paek, T. 2005. Online Adaptation of Influence Diagrams. Technical Report MSR-TR-2005-55. Microsoft Corporation.Google Scholar
English, M., Heeman, P. 2005. Learning mixed-initiative dialog strategies by using reinforcement learning on both conversants. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, 10111018.Google Scholar
Forbes-Riley, K., Litman, D. 2005. Using bigrams to identify relationships between student certainness states and tutor responses in a spoken dialogue corpus. In Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue, Dybkjaer, L. & Minker, W. (eds). Lisbon, Portugal, 8796.Google Scholar
Frampton, M. 2008. Using Dialogue Acts in Dialogue Strategy Learning: Optimising Repair Strategies. PhD thesis, University of Edinburgh, UK.Google Scholar
Frampton, M., Lemon, O. 2005. Reinforcement learning of dialogue strategies using the user’s last dialogue act. In Proceedings of the 4th IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems, Zukerman, I., Alexandersson, J. & Jönsson, A. (eds). Edinburgh, UK, 8390.Google Scholar
Frampton, M., Lemon, O. 2006. Learning more effective dialogue strategies using limited dialogue move features. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, 185192.Google Scholar
Frampton, M., Lemon, O. 2008. Using dialogue acts to learn better repair strategies. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Las Vegas, USA, 50455048.Google Scholar
Fraser, N., Gilbert, G. 1991. Simulating speech systems. Computer Speech and Language 5(1), 8199.CrossRefGoogle Scholar
Georgila, K., Henderson, J., Lemon, O. 2005a. Learning user simulations for information state update dialogue systems. In Proceedings of Eurospeech, Lisbon, Portugal, 893896.Google Scholar
Georgila, K., Lemon, O., Henderson, J. 2005b. Automatic annotation of COMMUNICATOR dialogue data for learning dialogue strategies and user simulations. In Proceedings of the 9th Workshop on the Semantics and Pragmatics of Dialogue (SEMDIAL: DIALOR), Gardent, C. & Gaiffe, B. (eds). Nancy, France.Google Scholar
Hall, M. 1999. Correlation-based Feature Selection for Machine Learning. PhD thesis, University Of Waikato, New Zealand.Google Scholar
Heckerman, D. 1995. A Bayesian approach to learning causal networks. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, 285295.Google Scholar
Henderson, J., Lemon, O., Georgila, K. 2005. Hybrid reinforcement/supervised learning for dialogue policies from COMMUNICATOR data. In Proceedings of the 4th IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems, Zukerman, I., Alexandersson, J. & Jönsson, A. (eds). Edinburgh, UK, 6875.Google Scholar
Henderson, J., Lemon, O., Georgila, K. 2008. Hybrid reinforcement/supervised learning of dialogue policies from fixed datasets. Computational Linguistics 34(4), 487511.CrossRefGoogle Scholar
Kearns, M., Mansour, Y., Ng, A. 1999. A sparse sampling algorithm for near-optimal planning in large Markov Decision Processes. In Proceedings of the 16th International Joint Conference on Artificial Intelligence, 13241331.Google Scholar
Lemon, O., Georgila, K., Henderson, J. 2006a. Evaluating effectiveness and portability of reinforcement learned strategies. In Proceedings of the IEEE/ACL Workshop on Spoken Language Technology, Palm Beach, Aruba, 178181.Google Scholar
Lemon, O., Georgila, K., Henderson, J., Stuttle, M. 2006b. An ISU dialogue system exhibiting reinforcement learning of dialogue policies: generic slot-filling in the TALK in-car system. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, 119122.Google Scholar
Levin, E., Pieraccini, R. 1997. A stochastic model of computer-human interaction for learning dialogue strategies. In Proceedings of Eurospeech, 18831886.CrossRefGoogle Scholar
Levin, E., Pieraccini, R., Eckert, W. 1998. Using Markov Decision Processes for learning dialogue strategies. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Seattle, USA, 201204.Google Scholar
Levin, E., Pieraccini, R., Eckert, W. 2000. A stochastic model of computer-human interaction for learning dialogue strategies. IEEE Transactions On Speech and Audio Processing 8(1), 1123.CrossRefGoogle Scholar
Litman, D., Kearns, M., Singh, S., Walker, M. 2000. Automatic optimization of dialogue management. In Proceedings of COLING, Saarbrücken, Germany, 502508.Google Scholar
Litman, D., Silliman, S. 2004. ITSPOKE: An Intelligent Tutoring Spoken dialogue system. In Companion Proceedings of the Human Language Technology Conference: 4th Meeting of the North American Chapter of the Association for Computational Linguistics, Boston, USA, 58.Google Scholar
Paek, T., Chickering, D. 2005. The Markov assumption in spoken dialogue management. In Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue, Dybkjaer, L. & Minker, W. (eds). Lisbon, Portugal, 3544.Google Scholar
Pietquin, O. 2004. A Framework for Unsupervised Learning of Dialogue Strategies. PhD thesis, Faculté Polytechnique de Mons, TCTS Lab, Belgique.Google Scholar
Pietquin, O., Renals, S. 2002. ASR system modeling for automatic evaluation and optimization of dialogue systems. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, USA, 4548.Google Scholar
Roy, N., Pineau, J., Thrun, S. 2000. Spoken dialogue management using probabilistic reasoning. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, 93–100.Google Scholar
Schatzmann, J., Weilhammer, K., Stuttle, M. N., Young, S. 2006. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. Knowledge Engineering Review 21(2), 97126.CrossRefGoogle Scholar
Scheffler, K., Young, S. 2000. Probabilistic simulation of human-machine dialogues. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 12171220.Google Scholar
Scheffler, K., Young, S. 2001. Corpus-based dialogue simulation for automatic strategy learning and evaluation. In Proceedings of the NAACL Workshop on Adaptation in Dialogue Systems, 6470.Google Scholar
Scheffler, K., Young, S. 2002. Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning. In Proceedings of the Human Language Technology conference, Marcus, M. (ed.). San Diego, USA, 1219.Google Scholar
Sheskin, D. 2007. Handbook of Parametric and Nonparametric Statistical Procedures, 4th ednTaylor and Francis Group.Google Scholar
Singh, S., Kearns, M., Litman, D., Walker, M. 1999. Reinforcement learning for spoken dialogue systems. In Proceedings of the Annual Conference on Neural Information Processing Systems, Denver, USA, 956962.Google Scholar
Singh, S., Kearns, M., Litman, D., Walker, M. 2000. Reinforcement learning for spoken dialogue systems. In Advances in Neural Information Processing Systems, Solla, S.A., Leen, T.K. & Müller, K.-R. (eds). 12, 956962. MIT Press.Google Scholar
Singh, S., Litman, D., Kearns, M., Walker, M. 2002. Optimizing dialogue management with reinforcement learning: experiments with the NJFun system. Journal of Artificial Intelligence Research 16, 105133.CrossRefGoogle Scholar
Skantze, G. 2003. Exploring human error handling strategies: implications for spoken dialogue systems. In Proceedings of the ISCA Workshop on Error Handling in Spoken Dialogue Systems, Vaud, Switzerland, 7176.Google Scholar
Sutton, R., Barto, A. 1998. Reinforcement Learning: An Introduction. The MIT Press.Google Scholar
Tetreault, J., Litman, D. 2006. Comparing the utility of state features in spoken dialogue using reinforcement learning. In Proceedings of the Human Language Technology Conference/North American chapter of the Association for Computational Linguistics annual meeting, Moore, R.C., Bilmes, J.A. & Chu-Carroll, J. (eds). New York, USA, 272279.Google Scholar
Thomson, B., Schatzmann, J., Weilhammer, K., Ye, H., Young, S. 2007. Training a real-world POMDP-based dialog system. In Proceedings of Bridging the Gap: Academic and Industrial Research in Dialog Technologies, 916. ACL.Google Scholar
Walker, M. 2000. An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email. Journal of Artificial Intelligence Research 12, 387416.CrossRefGoogle Scholar
Walker, M., Aberdeen, J., Boland, J., Bratt, E., Garofolo, J., Hirschman, L., Le, A., Lee, S., Narayanan, S., Papineni, K., Pellom, B., Polifroni, B., Potamianos, A., Prabhu, P., Rudnicky, A., Sanders, G., Seneff, S., Stallard, D., Whittaker, S. 2001a. DARPA Communicator dialog travel planning systems: the June 2000 data collection. In Proceedings of Eurospeech, Aalborg, Denmark, 13711374.CrossRefGoogle Scholar
Walker, M., Fromer, J., Narayanan, S. 1998. Learning optimal dialogue strategies: a case study of a spoken dialogue agent for email. In Proceedings of the 36th Annual Meeting of the Association of Computational Linguistics, 13451352.CrossRefGoogle Scholar
Walker, M., Kamm, C., Litman, D. 2000. Towards developing general models of usability with PARADISE. Natural Language Engineering 6(3), 363377.CrossRefGoogle Scholar
Walker, M., Litman, D., Kamm, C., Abella, A. 1997. PARADISE: a framework for evaluating spoken dialogue agents. In Proceedings of the 35th Annual Meeting of the Association of Computational Linguistics, 271280.CrossRefGoogle Scholar
Walker, M., Passonneau, R. 2001. DATE: a Dialogue Act Tagging scheme for Evaluation of spoken dialogue systems. In Proceedings of the Human Language Technology Conference, San Diego, USA.Google Scholar
Walker, M., Passonneau, R., Boland, J. 2001b. Quantitative and qualitative evaluation of Darpa Communicator spoken dialogue systems. In Proceedings of the 39th Annual Meeting of the Association for Compuational Linguistics, 515522.Google Scholar
Walker, M., Rudnicky, A., Prasad, R., Aberdeen, J., Bratt, E., Garofolo, J., Hastie, H., Le, A., Pellom, B., Potamianos, A., Passonneau, R., Roukos, S., Sanders, G., Seneff, S., Stallard, D. 2002. DARPA Communicator: cross-system results for the 2001 evaluation. In Proceedings of the International Conference on Spoken Language Processing, Denver, USA, 269272.Google Scholar
Williams, J. 2007. Applying POMDPs to dialog systems in the troubleshooting domain. In Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies, 18. ACL.Google Scholar
Williams, J., Poupart, P., Young, S. 2005a. Factored Partially Observable Markov Decision Processes for dialogue management. In 4th IJCAI Workshop on Knowledge and Reasoning in Practical Dialog Systems, Zukerman, I., Alexandersson, J. & Jönsson, A. (eds). Edinburgh, UK, 7682.Google Scholar
Williams, J., Poupart, P., Young, S. 2005b. Partially Observable Markov Decision Processes with continuous observations for dialogue management. In Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue, Dybkjaer, L. & Minker, W. (eds). Lisbon, Portugal, 8796.Google Scholar
Williams, J., Young, S. 2005. Scaling up POMDPs for dialog management: the ‘Summary POMDP’ method. In Automatic Speech Recognition and Understanding Workshop, Puerto Rico, USA, 250255.Google Scholar
Williams, J., Young, S. 2006. Scaling POMDPs for dialog management with composite summary point-based value iteration (CSPBVI). In AAAI Workshop on Statistical and Empirical Approaches for Spoken Dialogue Systems, Boston, USA, 3742.Google Scholar
Williams, J., Young, S. 2007. Partially Observable Markov Decision Processes for spoken dialog systems. Computer Speech and Language 21(2), 231422.Google Scholar