research-article

Corrective Guidance and Learning for Dialogue Management

Authors:
Mahdin Rohmatillah

National Yang Ming Chiao Tung University, Hsinchu, Taiwan Roc

National Yang Ming Chiao Tung University, Hsinchu, Taiwan Roc
View Profile

,
Jen-Tzung Chien

National Yang Ming Chiao Tung University, Hsinchu, Taiwan Roc

National Yang Ming Chiao Tung University, Hsinchu, Taiwan Roc
View Profile

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge ManagementOctober 2021Pages 1548–1557https://doi.org/10.1145/3459637.3482333

Published:30 October 2021Publication History

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pages 1548–1557

ABSTRACT

Establishing robust dialogue policy with low computation cost is challenging, especially for multi-domain task-oriented dialogue management due to the high complexity in state and action spaces. The previous works mostly using the deterministic policy optimization only attain moderate performance. Meanwhile, state-of-the-art result that uses end-to-end approach is computationally demanding since it utilizes a large-scaled language model based on the generative pre-trained transformer-2 (GPT-2). In this study, a new learning procedure consisting of three learning stages is presented to improve multi-domain dialogue management with corrective guidance. Firstly, the behavior cloning with an auxiliary task is developed to build a robust pre-trained model by mitigating the causal confusion problem in imitation learning. Next, the pre-trained model is rectified by using reinforcement learning via the proximal policy optimization. Lastly, human-in-the-loop learning strategy is fulfilled to enhance the agent performance by directly providing corrective feedback from rule-based agent so that the agent is prevented to trap in confounded states. The experiments on end-to-end evaluation show that the proposed learning method achieves state-of-the-art result by performing nearly identical to the rule-based agent. This method outperforms the second place of 9th dialog system technology challenge (DSTC9) track 2 that uses GPT-2 as the core model in dialogue management.

References

David Abel, John Salvatier, Andreas Stuhlmüller, and Owain Evans. 2016. Agent-Agnostic Human-in-the-Loop Reinforcement Learning. In NIPS Workshop on the Future of Interactive Learning Machines.Google Scholar
Antoine Bordes, Y-Lan Boureau, and Jason Weston. 2016. Learning end-to-end goal-oriented dialog. arXiv preprint arXiv:1605.07683 (2016).Google Scholar
Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, I nigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gavs ić. 2018. MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling. In Proc. of Conference on Empirical Methods in Natural Language Processing. 5016--5026.Google ScholarCross Ref
Bill Byrne, Karthik Krishnamoorthi, Chinnadhurai Sankar, Arvind Neelakantan, Ben Goodrich, Daniel Duckworth, Semih Yavuz, Amit Dubey, Kyu-Young Kim, and Andy Cedilnik. 2019. Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset. In Proc. of Conference on Empirical Methods in Natural Language Processing. 4516--4525.Google ScholarCross Ref
Jen-Tzung Chien. 2015. Hierarchical Pitman--Yor--Dirichlet language model. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 23, 8 (2015), 1259--1272.Google ScholarDigital Library
Jen-Tzung Chien. 2019. Deep Bayesian natural language processing. In Proc. of Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts. 25--30.Google ScholarCross Ref
Jen-Tzung Chien and Ying-Lan Chang. 2014. Bayesian sparse topic model. Journal of Signal Processing Systems , Vol. 74, 3 (2014), 375--389. Google ScholarDigital Library
Jen-Tzung Chien and Po-Chien Hsu. 2020. Stochastic curiosity exploration for dialogue systems. In Proc. of Annual Conference of International Speech Communication Association. 3885--3889.Google ScholarCross Ref
Jen-Tzung Chien and Po-Yen Hung. 2020. Multiple Target Prediction for Deep Reinforcement Learning. In Proc. of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. 1611--1616.Google Scholar
Jen-Tzung Chien and Yuan-Chu Ku. 2015. Bayesian recurrent neural network for language modeling. IEEE Transactions on Neural Networks and Learning Systems, Vol. 27, 2 (2015), 361--374.Google ScholarCross Ref
Jen-Tzung Chien, Wei-Lin Liao, and Issam El Naqa. 2020. Exploring state transition uncertainty in variational reinforcement learning. In Proc. of European Signal Processing Conference. 1527--1531.Google Scholar
Jen-Tzung Chien and Wei Xiang Lieow. 2019. Meta Learning for Hyperparameter Optimization in Dialogue System.. In Proc. of Annual Conference of International Speech Communication Association. 839--843.Google ScholarCross Ref
Thibault Cordier, Tanguy Urvoy, Lina M Rojas-Barahona, and Fabrice Lefèvre. 2020. Diluted Near-Optimal Expert Demonstrations for Guiding Dialogue Stochastic Policy Optimisation. In NIPS Workshop on Human in the Loop Dialogue Systems.Google Scholar
Kevin Crowston. 2012. Amazon Mechanical Turk: A Research Tool for Organizations and Information Systems Scholars. In Proc. of Shaping the Future of ICT Research. Methods and Approaches. 210--221.Google ScholarCross Ref
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. of North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171--4186.Google Scholar
Xinyu Duan, Yating Zhang, Lin Yuan, Xin Zhou, Xiaozhong Liu, Tianyi Wang, Ruocheng Wang, Qiong Zhang, Changlong Sun, and Fei Wu. 2019. Legal Summarization for Multi-Role Debate Dialogue via Controversy Focus Mining and Multi-Task Learning. In Proc. of ACM International Conference on Information and Knowledge Management. 1361--1370. Google ScholarDigital Library
Mihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyag Gao, and Dilek Hakkani-Tur. 2019. MultiWOZ 2.1: Multi-Domain Dialogue State Corrections and State Tracking Baselines. arXiv preprint arXiv:1907.01669 (2019).Google Scholar
Justin Fu, Katie Luo, and Sergey Levine. 2017. Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint arXiv:1710.11248 (2017).Google Scholar
Xisen Jin, Wenqiang Lei, Zhaochun Ren, Hongshen Chen, Shangsong Liang, Yihong Zhao, and Dawei Yin. 2018. Explicit State Tracking with Semi-Supervision for Neural Dialogue Generation. In Proc. of International Conference on Information and Knowledge Management. 1403--1412. Google ScholarDigital Library
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proc. of the International Conference on Learning Representations.Google Scholar
Jonávs Kulhánek, Vojtve ch Hudevc ek, Tomávs Nekvinda, and Ondvr ej Duvs ek. 2021. AuGPT: Dialogue with Pre-trained Language Models and Data Augmentation. arXiv preprint arXiv:2102.05126 (2021).Google Scholar
Jinchao Li, Qi Zhu, Lingxiao Luo, Lars Liden, Kaili Huang, Shahin Shayandeh, Runze Liang, Baolin Peng, Zheng Zhang, Swadheen Shukla, Ryuichi Takanobu, Minlie Huang, and Jianfeng Gao. 2021. Multi-Domain Task Completion Dialog Challenge 2 at DSTC9. In AAAI Workshop on Dialog System Technology Challenge.Google Scholar
Ziming Li, Sungjin Lee, Baolin Peng, Jinchao Li, Julia Kiseleva, Maarten de Rijke, Shahin Shayandeh, and Jianfeng Gao. 2020. Guided Dialogue Policy Learning without Adversarial Learning in the Loop. In Proc. of Conference on Empirical Methods in Natural Language Processing. 2308--2317.Google ScholarCross Ref
Baolin Peng, Xiujun Li, Jianfeng Gao, Jingjing Liu, and Kam-Fai Wong. 2018. Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning. In Proc. of Annual Meeting of the Association for Computational Linguistics. 2182--2192.Google ScholarCross Ref
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. OpenAI blog (2019).Google Scholar
Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, and Pranav Khaitan. 2020. Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset. In Proc. of AAAI Conference on Artificial Intelligence, Vol. 34. 8689--8696.Google ScholarCross Ref
Mahdin Rohmatillah and Jen-Tzung Chien. 2021. Causal Confusion Reduction for Robust Multi-Domain Dialogue Policy. In Proc. of Annual Conference of the International Speech Communication Association.Google ScholarCross Ref
Stephane Ross, Geoffrey Gordon, and Drew Bagnell. 2011. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. In Proc. of International Conference on Artificial Intelligence and Statistics, Vol. 15. 627--635.Google Scholar
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. 2016. High-Dimensional Continuous Control Using Generalized Advantage Estimation. In Proc. of International Conference on Learning Representations.Google Scholar
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).Google Scholar
Stephanie Seneff and Joseph Polifroni. 2000. Dialogue Management in the Mercury Flight Reservation System. In ANLP-NAACL Workshop: Conversational Systems. Google ScholarDigital Library
Shang-Yu Su, Xiujun Li, Jianfeng Gao, Jingjing Liu, and Yun-Nung Chen. 2018. Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning. In Proc. of Conference on Empirical Methods in Natural Language Processing. 3813--3823.Google ScholarCross Ref
Ryuichi Takanobu, Runze Liang, and Minlie Huang. 2020a. Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward Decomposition. In Proc. of Annual Meeting of the Association for Computational Linguistics. 625--638.Google ScholarCross Ref
Ryuichi Takanobu, Hanlin Zhu, and Minlie Huang. 2019. Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog. In Proc. of Conference on Empirical Methods in Natural Language Processing. 100--110.Google ScholarCross Ref
Ryuichi Takanobu, Qi Zhu, Jinchao Li, Baolin Peng, Jianfeng Gao, and Minlie Huang. 2020b. Is Your Goal-Oriented Dialog Model Performing Really Well? Empirical Analysis of System-wise Evaluation. In Proc. of Annual Meeting of the Special Interest Group on Discourse and Dialogue. 297--310.Google Scholar
Tijmen Tieleman and Geoffrey Hinton. 2012. Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, Vol. 4, 2 (2012), 26--31.Google Scholar
H.-H. Tseng, Y. Luo, S. Cui, J.-T. Chien, R. K. Ten Haken, and I. El Naqa. 2017. Deep reinforcement learning for automated radiation adaptation in lung cancer. Medical Physics, Vol. 44, 12 (2017), 6690--6705.Google ScholarCross Ref
Stefan Ultes, Lina M. Rojas-Barahona, Pei-Hao Su, David Vandyke, Dongho Kim, I nigo Casanueva, Paweł Budzianowski, Nikola Mrkvs ić, Tsung-Hsien Wen, Milica Gavs ić, and Steve Young. 2017. PyDial: A Multi-domain Statistical Dialogue System Toolkit. In Proc. of Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 73--78.Google ScholarCross Ref
Shinji Watanabe and Jen-Tzung Chien. 2015. Bayesian speech and language processing .Cambridge University Press. Google ScholarDigital Library
Tsung-Hsien Wen, Milica Gavs ić, Nikola Mrkvs ić, Lina M. Rojas-Barahona, Pei-Hao Su, Stefan Ultes, David Vandyke, and Steve Young. 2016. Conditional Generation and Snapshot Learning in Neural Dialogue Systems. In Proc. of Conference on Empirical Methods in Natural Language Processing. 2153--2162.Google ScholarCross Ref
Yuexin Wu, Xiujun Li, Jingjing Liu, Jianfeng Gao, and Yiming Yang. 2019. Switch-Based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning. In Proc. of AAAI Conference on Artificial Intelligence. 7289--7296.Google ScholarCross Ref
Boliang Zhang, Ying Lyu, Ning Ding, Tianhao Shen, Zhaoyang Jia, Kun Han, and Kevin Knight. 2021. A Hybrid Task-Oriented Dialog System with Domain and Task Adaptive Pretraining. arXiv preprint arXiv:2102.04506 (2021).Google Scholar
Zheng Zhang, Ryuichi Takanobu, Qi Zhu, MinLie Huang, and XiaoYan Zhu. 2020. Recent advances and challenges in task-oriented dialog systems. Science China Technological Sciences (2020), 1--17.Google Scholar
Yangyang Zhao, Zhenyu Wang, Kai Yin, Rui Zhang, Zhenhua Huang, and Pei Wang. 2020. Dynamic Reward-Based Dueling Deep Dyna-Q: Robust Policy Learning in Noisy Environments. In Proc. of AAAI Conference on Artificial Intelligence. 9676--9684.Google ScholarCross Ref
Qi Zhu, Zheng Zhang, Yan Fang, Xiang Li, Ryuichi Takanobu, Jinchao Li, Baolin Peng, Jianfeng Gao, Xiaoyan Zhu, and Minlie Huang. 2020. ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems. In Proc. of Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 142--149.Google ScholarCross Ref

Index Terms

Corrective Guidance and Learning for Dialogue Management
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Discourse, dialogue and pragmatics
  2. Machine learning

Recommendations

Automating spoken dialogue management design using machine learning: An industry perspective

In designing a spoken dialogue system, developers need to specify the actions a system should take in response to user speech input and the state of the environment based on observed or inferred events, states, and beliefs. This is the fundamental task ...
Read More
From vocal to multimodal dialogue management
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

Multimodal, speech-enabled systems pose different research problems when compared to unimodal, voice-only dialogue systems. One of the important issues is the question of how a multimodal interface should look like in order to make the multimodal ...
Read More
Experience Replay-based Deep Reinforcement Learning for Dialogue Management Optimisation
Dialogue policy is a crucial component in task-oriented Spoken Dialogue Systems (SDSs). As a decision function, it takes the current dialogue state as input and generates appropriate system’s response. In this paper, we explore the reinforcement learning ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
October 2021
4966 pages
ISBN:9781450384469
DOI:10.1145/3459637
General Chairs:
Gianluca Demartini
The University of Queensland, Australia
,
Guido Zuccon
The University of Queensland, Australia
,
Program Chairs:
J. Shane Culpepper
RMIT University, Australia
,
Zi Huang
The University of Queensland, Australia
,
Hanghang Tong
University of Illinois at Urbana-Champaign, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
behavior cloning
dialogue management
human-in-the-loop
imitation learning
policy optimization
reinforcement learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 274
  Total Downloads
- Downloads (Last 12 months)77
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Corrective Guidance and Learning for Dialogue Management

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automating spoken dialogue management design using machine learning: An industry perspective

From vocal to multimodal dialogue management

Experience Replay-based Deep Reinforcement Learning for Dialogue Management Optimisation