skip to main content
10.1145/3459637.3482333acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Corrective Guidance and Learning for Dialogue Management

Published:30 October 2021Publication History

ABSTRACT

Establishing robust dialogue policy with low computation cost is challenging, especially for multi-domain task-oriented dialogue management due to the high complexity in state and action spaces. The previous works mostly using the deterministic policy optimization only attain moderate performance. Meanwhile, state-of-the-art result that uses end-to-end approach is computationally demanding since it utilizes a large-scaled language model based on the generative pre-trained transformer-2 (GPT-2). In this study, a new learning procedure consisting of three learning stages is presented to improve multi-domain dialogue management with corrective guidance. Firstly, the behavior cloning with an auxiliary task is developed to build a robust pre-trained model by mitigating the causal confusion problem in imitation learning. Next, the pre-trained model is rectified by using reinforcement learning via the proximal policy optimization. Lastly, human-in-the-loop learning strategy is fulfilled to enhance the agent performance by directly providing corrective feedback from rule-based agent so that the agent is prevented to trap in confounded states. The experiments on end-to-end evaluation show that the proposed learning method achieves state-of-the-art result by performing nearly identical to the rule-based agent. This method outperforms the second place of 9th dialog system technology challenge (DSTC9) track 2 that uses GPT-2 as the core model in dialogue management.

References

  1. David Abel, John Salvatier, Andreas Stuhlmüller, and Owain Evans. 2016. Agent-Agnostic Human-in-the-Loop Reinforcement Learning. In NIPS Workshop on the Future of Interactive Learning Machines.Google ScholarGoogle Scholar
  2. Antoine Bordes, Y-Lan Boureau, and Jason Weston. 2016. Learning end-to-end goal-oriented dialog. arXiv preprint arXiv:1605.07683 (2016).Google ScholarGoogle Scholar
  3. Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, I nigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gavs ić. 2018. MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling. In Proc. of Conference on Empirical Methods in Natural Language Processing. 5016--5026.Google ScholarGoogle ScholarCross RefCross Ref
  4. Bill Byrne, Karthik Krishnamoorthi, Chinnadhurai Sankar, Arvind Neelakantan, Ben Goodrich, Daniel Duckworth, Semih Yavuz, Amit Dubey, Kyu-Young Kim, and Andy Cedilnik. 2019. Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset. In Proc. of Conference on Empirical Methods in Natural Language Processing. 4516--4525.Google ScholarGoogle ScholarCross RefCross Ref
  5. Jen-Tzung Chien. 2015. Hierarchical Pitman--Yor--Dirichlet language model. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 23, 8 (2015), 1259--1272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jen-Tzung Chien. 2019. Deep Bayesian natural language processing. In Proc. of Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts. 25--30.Google ScholarGoogle ScholarCross RefCross Ref
  7. Jen-Tzung Chien and Ying-Lan Chang. 2014. Bayesian sparse topic model. Journal of Signal Processing Systems , Vol. 74, 3 (2014), 375--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jen-Tzung Chien and Po-Chien Hsu. 2020. Stochastic curiosity exploration for dialogue systems. In Proc. of Annual Conference of International Speech Communication Association. 3885--3889.Google ScholarGoogle ScholarCross RefCross Ref
  9. Jen-Tzung Chien and Po-Yen Hung. 2020. Multiple Target Prediction for Deep Reinforcement Learning. In Proc. of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. 1611--1616.Google ScholarGoogle Scholar
  10. Jen-Tzung Chien and Yuan-Chu Ku. 2015. Bayesian recurrent neural network for language modeling. IEEE Transactions on Neural Networks and Learning Systems, Vol. 27, 2 (2015), 361--374.Google ScholarGoogle ScholarCross RefCross Ref
  11. Jen-Tzung Chien, Wei-Lin Liao, and Issam El Naqa. 2020. Exploring state transition uncertainty in variational reinforcement learning. In Proc. of European Signal Processing Conference. 1527--1531.Google ScholarGoogle Scholar
  12. Jen-Tzung Chien and Wei Xiang Lieow. 2019. Meta Learning for Hyperparameter Optimization in Dialogue System.. In Proc. of Annual Conference of International Speech Communication Association. 839--843.Google ScholarGoogle ScholarCross RefCross Ref
  13. Thibault Cordier, Tanguy Urvoy, Lina M Rojas-Barahona, and Fabrice Lefèvre. 2020. Diluted Near-Optimal Expert Demonstrations for Guiding Dialogue Stochastic Policy Optimisation. In NIPS Workshop on Human in the Loop Dialogue Systems.Google ScholarGoogle Scholar
  14. Kevin Crowston. 2012. Amazon Mechanical Turk: A Research Tool for Organizations and Information Systems Scholars. In Proc. of Shaping the Future of ICT Research. Methods and Approaches. 210--221.Google ScholarGoogle ScholarCross RefCross Ref
  15. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. of North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171--4186.Google ScholarGoogle Scholar
  16. Xinyu Duan, Yating Zhang, Lin Yuan, Xin Zhou, Xiaozhong Liu, Tianyi Wang, Ruocheng Wang, Qiong Zhang, Changlong Sun, and Fei Wu. 2019. Legal Summarization for Multi-Role Debate Dialogue via Controversy Focus Mining and Multi-Task Learning. In Proc. of ACM International Conference on Information and Knowledge Management. 1361--1370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyag Gao, and Dilek Hakkani-Tur. 2019. MultiWOZ 2.1: Multi-Domain Dialogue State Corrections and State Tracking Baselines. arXiv preprint arXiv:1907.01669 (2019).Google ScholarGoogle Scholar
  18. Justin Fu, Katie Luo, and Sergey Levine. 2017. Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint arXiv:1710.11248 (2017).Google ScholarGoogle Scholar
  19. Xisen Jin, Wenqiang Lei, Zhaochun Ren, Hongshen Chen, Shangsong Liang, Yihong Zhao, and Dawei Yin. 2018. Explicit State Tracking with Semi-Supervision for Neural Dialogue Generation. In Proc. of International Conference on Information and Knowledge Management. 1403--1412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proc. of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  21. Jonávs Kulhánek, Vojtve ch Hudevc ek, Tomávs Nekvinda, and Ondvr ej Duvs ek. 2021. AuGPT: Dialogue with Pre-trained Language Models and Data Augmentation. arXiv preprint arXiv:2102.05126 (2021).Google ScholarGoogle Scholar
  22. Jinchao Li, Qi Zhu, Lingxiao Luo, Lars Liden, Kaili Huang, Shahin Shayandeh, Runze Liang, Baolin Peng, Zheng Zhang, Swadheen Shukla, Ryuichi Takanobu, Minlie Huang, and Jianfeng Gao. 2021. Multi-Domain Task Completion Dialog Challenge 2 at DSTC9. In AAAI Workshop on Dialog System Technology Challenge.Google ScholarGoogle Scholar
  23. Ziming Li, Sungjin Lee, Baolin Peng, Jinchao Li, Julia Kiseleva, Maarten de Rijke, Shahin Shayandeh, and Jianfeng Gao. 2020. Guided Dialogue Policy Learning without Adversarial Learning in the Loop. In Proc. of Conference on Empirical Methods in Natural Language Processing. 2308--2317.Google ScholarGoogle ScholarCross RefCross Ref
  24. Baolin Peng, Xiujun Li, Jianfeng Gao, Jingjing Liu, and Kam-Fai Wong. 2018. Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning. In Proc. of Annual Meeting of the Association for Computational Linguistics. 2182--2192.Google ScholarGoogle ScholarCross RefCross Ref
  25. Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. OpenAI blog (2019).Google ScholarGoogle Scholar
  26. Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, and Pranav Khaitan. 2020. Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset. In Proc. of AAAI Conference on Artificial Intelligence, Vol. 34. 8689--8696.Google ScholarGoogle ScholarCross RefCross Ref
  27. Mahdin Rohmatillah and Jen-Tzung Chien. 2021. Causal Confusion Reduction for Robust Multi-Domain Dialogue Policy. In Proc. of Annual Conference of the International Speech Communication Association.Google ScholarGoogle ScholarCross RefCross Ref
  28. Stephane Ross, Geoffrey Gordon, and Drew Bagnell. 2011. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. In Proc. of International Conference on Artificial Intelligence and Statistics, Vol. 15. 627--635.Google ScholarGoogle Scholar
  29. John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. 2016. High-Dimensional Continuous Control Using Generalized Advantage Estimation. In Proc. of International Conference on Learning Representations.Google ScholarGoogle Scholar
  30. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).Google ScholarGoogle Scholar
  31. Stephanie Seneff and Joseph Polifroni. 2000. Dialogue Management in the Mercury Flight Reservation System. In ANLP-NAACL Workshop: Conversational Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Shang-Yu Su, Xiujun Li, Jianfeng Gao, Jingjing Liu, and Yun-Nung Chen. 2018. Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning. In Proc. of Conference on Empirical Methods in Natural Language Processing. 3813--3823.Google ScholarGoogle ScholarCross RefCross Ref
  33. Ryuichi Takanobu, Runze Liang, and Minlie Huang. 2020a. Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward Decomposition. In Proc. of Annual Meeting of the Association for Computational Linguistics. 625--638.Google ScholarGoogle ScholarCross RefCross Ref
  34. Ryuichi Takanobu, Hanlin Zhu, and Minlie Huang. 2019. Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog. In Proc. of Conference on Empirical Methods in Natural Language Processing. 100--110.Google ScholarGoogle ScholarCross RefCross Ref
  35. Ryuichi Takanobu, Qi Zhu, Jinchao Li, Baolin Peng, Jianfeng Gao, and Minlie Huang. 2020b. Is Your Goal-Oriented Dialog Model Performing Really Well? Empirical Analysis of System-wise Evaluation. In Proc. of Annual Meeting of the Special Interest Group on Discourse and Dialogue. 297--310.Google ScholarGoogle Scholar
  36. Tijmen Tieleman and Geoffrey Hinton. 2012. Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, Vol. 4, 2 (2012), 26--31.Google ScholarGoogle Scholar
  37. H.-H. Tseng, Y. Luo, S. Cui, J.-T. Chien, R. K. Ten Haken, and I. El Naqa. 2017. Deep reinforcement learning for automated radiation adaptation in lung cancer. Medical Physics, Vol. 44, 12 (2017), 6690--6705.Google ScholarGoogle ScholarCross RefCross Ref
  38. Stefan Ultes, Lina M. Rojas-Barahona, Pei-Hao Su, David Vandyke, Dongho Kim, I nigo Casanueva, Paweł Budzianowski, Nikola Mrkvs ić, Tsung-Hsien Wen, Milica Gavs ić, and Steve Young. 2017. PyDial: A Multi-domain Statistical Dialogue System Toolkit. In Proc. of Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 73--78.Google ScholarGoogle ScholarCross RefCross Ref
  39. Shinji Watanabe and Jen-Tzung Chien. 2015. Bayesian speech and language processing .Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Tsung-Hsien Wen, Milica Gavs ić, Nikola Mrkvs ić, Lina M. Rojas-Barahona, Pei-Hao Su, Stefan Ultes, David Vandyke, and Steve Young. 2016. Conditional Generation and Snapshot Learning in Neural Dialogue Systems. In Proc. of Conference on Empirical Methods in Natural Language Processing. 2153--2162.Google ScholarGoogle ScholarCross RefCross Ref
  41. Yuexin Wu, Xiujun Li, Jingjing Liu, Jianfeng Gao, and Yiming Yang. 2019. Switch-Based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning. In Proc. of AAAI Conference on Artificial Intelligence. 7289--7296.Google ScholarGoogle ScholarCross RefCross Ref
  42. Boliang Zhang, Ying Lyu, Ning Ding, Tianhao Shen, Zhaoyang Jia, Kun Han, and Kevin Knight. 2021. A Hybrid Task-Oriented Dialog System with Domain and Task Adaptive Pretraining. arXiv preprint arXiv:2102.04506 (2021).Google ScholarGoogle Scholar
  43. Zheng Zhang, Ryuichi Takanobu, Qi Zhu, MinLie Huang, and XiaoYan Zhu. 2020. Recent advances and challenges in task-oriented dialog systems. Science China Technological Sciences (2020), 1--17.Google ScholarGoogle Scholar
  44. Yangyang Zhao, Zhenyu Wang, Kai Yin, Rui Zhang, Zhenhua Huang, and Pei Wang. 2020. Dynamic Reward-Based Dueling Deep Dyna-Q: Robust Policy Learning in Noisy Environments. In Proc. of AAAI Conference on Artificial Intelligence. 9676--9684.Google ScholarGoogle ScholarCross RefCross Ref
  45. Qi Zhu, Zheng Zhang, Yan Fang, Xiang Li, Ryuichi Takanobu, Jinchao Li, Baolin Peng, Jianfeng Gao, Xiaoyan Zhu, and Minlie Huang. 2020. ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems. In Proc. of Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 142--149.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Corrective Guidance and Learning for Dialogue Management

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
              October 2021
              4966 pages
              ISBN:9781450384469
              DOI:10.1145/3459637

              Copyright © 2021 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 30 October 2021

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate1,861of8,427submissions,22%

              Upcoming Conference

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader