MPFToD: a modularized pre-training framework for consistency identification in task-oriented dialogue

Qin, Libo; Huang, Shijue; Chen, Qiguang; Liu, Qian; Che, Wanxiang; Xu, Ruifeng

doi:10.1007/s11704-024-3778-9

MPFToD: a modularized pre-training framework for consistency identification in task-oriented dialogue

Research Article
Published: 28 January 2025

Volume 19, article number 1910351, (2025)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Libo Qin¹^na1,
Shijue Huang²^na1,
Qiguang Chen³,
Qian Liu⁴,
Wanxiang Che³ &
…
Ruifeng Xu²

24 Accesses
Explore all metrics

Abstract

Consistency identification in task-oriented dialogue (CI-ToD) can prevent inconsistent dialogue response generation, which has recently emerged as an important and growing research area. This paper takes the first step to explore a pre-training paradigm for CI-ToD. Nevertheless, pre-training for CI-ToD is non-trivial because it requires a large amount of multi-turn KB-grounded dialogues, which are extremely hard to collect. To alleviate the data scarcity problem for pre-training, we introduce a modularized pre-training framework (MPFToD), which is capable of utilizing large amounts of KB-free dialogues. Specifically, such modularization allows us to decouple CI-ToD into three sub-modules and propose three pre-training tasks including (i) query response matching pre-training; (ii) dialogue history consistent identification pre-training; and (iii) KB mask language modeling to enhance different abilities of CI-ToD model. As different sub-tasks are solved separately, MPFToD can learn from large amounts of KB-free dialogues for different modules, which are much easier to obtain. Results on the CI-ToD benchmark show that MPFToD pushes the state-of-the-art performance from 56.3% to 61.0%. Furthermore, we show its transferability with promising performance on other downstream tasks (i.e., dialog act recognition, sentiment classification and table fact checking).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CopiFilter: An Auxiliary Module Adapts Pre-trained Transformers for Medical Dialogue Summarization

Low-Hanging Fruit: Knowledge Distillation from Noisy Teachers for Open Domain Spoken Language Understanding

PICKD: In-Situ Prompt Tuning for Knowledge-Grounded Dialogue Generation

References

Qin L, Xie T, Huang S, Chen Q, Xu X, Che W. Don’t be contradicted with anything! CI-ToD: towards benchmarking consistency for task-oriented dialogue system. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 2357–2367
Chapter MATH Google Scholar
Chen Q, Zhu X, Ling Z H, Wei S, Jiang H, Inkpen D. Enhanced LSTM for natural language inference. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017, 1657–1668
Chapter MATH Google Scholar
Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A. Supervised learning of universal sentence representations from natural language inference data. In: Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing. 2017, 670–680
Google Scholar
Yang R, Zhang J, Gao X, Ji F, Chen H. Simple and effective text matching with richer alignment features. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 4699–4709
Chapter MATH Google Scholar
Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 4171–4186
MATH Google Scholar
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. RoBERTa: a robustly optimized BERT pretraining approach. 2019, arXiv preprint arXiv: 1907.11692
Google Scholar
Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 7871–7880
Chapter Google Scholar
Qin L, Chen Q, Xie T, Liu Q, Huang S, Che W, Yu Z. CGIM: a cycle guided interactive learning model for consistency identification in task-oriented dialogue. In: Proceedings of the 29th International Conference on Computational Linguistics. 2022, 461–470
MATH Google Scholar
Qiu X, Sun T, Xu Y, Shao Y, Dai N, Huang X. Pre-trained models for natural language processing: a survey. Science China Technological Sciences, 2020, 63(10): 1872–1897
Article MATH Google Scholar
Chen W, Wang H, Chen J, Zhang Y, Wang H, Li S, Zhou X, Wang W Y. TabFact: a large-scale dataset for table-based fact verification. In: Proceedings of the 8th International Conference on Learning Representations. 2020
MATH Google Scholar
Yin P, Neubig G, Yih W T, Riedel S. TaBERT: pretraining for joint understanding of textual and tabular data. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 8413–8426
Chapter Google Scholar
Lee S, Schulz H, Atkinson A, Gao J, Suleman K, El Asri L, Adada M, Huang M, Sharma S, Tay W, Li X. Multi-domain task-completion dialog challenge. 2019
Google Scholar
El Asri L, Schulz H, Sharma S, Zumer J, Harris J, Fine E, Mehrotra R, Suleman K. Frames: a corpus for adding memory to goal-oriented dialogue systems. In: Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. 2017, 207–219
Google Scholar
Mrkšić N, Seaghdha D Ó, Wen T H, Thomson B, Young S. Neural belief tracker: data-driven dialogue state tracking. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017, 1777–1788
Chapter Google Scholar
Wen T H, Vandyke D, Mrkšić N, Gašić M, Rojas-Barahona L M, Su P H, Ultes S, Young S. A network-based end-to-end trainable task-oriented dialogue system. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. 2017, 438–449
Google Scholar
Eric M, Krishnan L, Charette F, Manning C D. Key-value retrieval networks for task-oriented dialogue. In: Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. 2017, 37–49
MATH Google Scholar
Budzianowski P, Wen T H, Tseng B H, Casanueva I, Ultes S, Ramadan O, Gašić M. MultiWOZ - a large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 5016–5026
Chapter Google Scholar
Rastogi A, Zang X, Sunkara S, Gupta R, Khaitan P. Towards scalable multi-domain conversational agents: the schema-guided dialogue dataset. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 8689–8696
MATH Google Scholar
Li X, Wang Y, Sun S, Panda S, Liu J, Gao J. Microsoft dialogue challenge: building end-to-end task-completion dialogue systems. 2018, arXiv preprint arXiv: 1807.11125
Google Scholar
Byrne B, Krishnamoorthi K, Sankar C, Neelakantan A, Goodrich B, Duckworth D, Yavuz S, Dubey A, Kim K Y, Cedilnik A. Taskmaster-1: toward a realistic and diverse dialog dataset. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019, 4516–4525
Google Scholar
Shah P, Hakkani-Tür D, Liu B, Tür G. Bootstrapping a neural conversational agent with dialogue self-play, crowdsourcing and on-line reinforcement learning. In: Proceedings of 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers). 2018, 41–51
MATH Google Scholar
Li Y, Su H, Shen X, Li W, Cao Z, Niu S. DailyDialog: a manually labelled multi-turn dialogue dataset. In: Proceedings of the 8th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2017, 986–995
MATH Google Scholar
Parikh A, Wang X, Gehrmann S, Faruqui M, Dhingra B, Yang D, Das D. ToTTo: a controlled table-to-text generation dataset. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. 1173–1186
Chapter Google Scholar
Loshchilov I, Hutter F. Decoupled weight decay regularization. In: Proceedings of the 7th International Conference on Learning Representations. 2019
MATH Google Scholar
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, Le Q V. Xlnet: generalized autoregressive pretraining for language understanding. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 517
MATH Google Scholar
Beltagy I, Peters M E, Cohan A. Longformer: the long-document transformer. 2020, arXiv preprint arXiv: 2004.05150
MATH Google Scholar
Wu C S, Hoi S C H, Socher R, Xiong C. TOD-BERT: pre-trained natural language understanding for task-oriented dialogue. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020, 917–929
Chapter MATH Google Scholar
Gururangan S, Marasović A, Swayamdipta S, Lo K, Beltagy I, Downey D, Smith N A. Don’t stop pretraining: adapt language models to domains and tasks. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 8342–8360
Chapter Google Scholar
Cerisara C, Jafaritazehjani S, Oluokun A, Le H T. Multi-task dialog act and sentiment recognition on mastodon. In: Proceedings of the 27th International Conference on Computational Linguistics. 2018, 745–754
MATH Google Scholar
Qin L, Che W, Li Y, Ni M, Liu T. Dcr-Net: a deep co-interactive relation network for joint dialog act recognition and sentiment classification. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 8665–8672
MATH Google Scholar
Zhuang Z, Chen Q, Ma L, Li M, Han Y, Qian Y, Bai H, Zhang W, Liu T. Through the lens of core competency: survey on evaluation of large language models. In: Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 2: Frontier Forum). 2023, 88–109
Google Scholar
Qin L, Chen Q, Feng X, Wu Y, Zhang Y, Li Y, Li M, Che W, Yu P S. Large language models meet NLP: a survey. 2024, arXiv preprint arXiv: 2405.12819
MATH Google Scholar
Zhang S, Dinan E, Urbanek J, Szlam A, Kiela D, Weston J. Personalizing dialogue agents: I have a dog, do you have pets too? In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018, 2204–2213
Chapter Google Scholar
Zheng Y, Chen G, Huang M, Liu S, Zhu X. Personalized dialogue generation with diversified traits. 2019, arXiv preprint arXiv:1901.09672
MATH Google Scholar
Welleck S, Weston J, Szlam A, Cho K. Dialogue natural language inference. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 3731–3741
Chapter Google Scholar
Dziri N, Kamalloo E, Mathewson K, Zaiane O. Evaluating coherence in dialogue systems using entailment. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 3806–3812
Google Scholar
Song H, Wang Y, Zhang W N, Zhao Z, Liu T, Liu X. Profile consistency identification for open-domain dialogue agents. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020, 6651–6662
Chapter MATH Google Scholar
Nie Y, Williamson M, Bansal M, Kiela D, Weston J. I like fish, especially dolphins: addressing contradictions in dialogue modeling. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021, 1699–1713
Google Scholar
Zhang Z, Guo T, Chen M. DialogueBERT: a self-supervised learning based dialogue pre-training encoder. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021, 3647–3651
Chapter MATH Google Scholar
Gu J C, Tao C, Ling Z, Xu C, Geng X, Jiang D. MPC-BERT: a pre-trained language model for multi-party conversation understanding. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021, 3682–3692
Google Scholar
Zhang Y, Sun S, Galley M, Chen Y C, Brockett C, Gao X, Gao J, Liu J, Dolan B. DIALOGPT: large-scale generative pre-training for conversational response generation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2020, 270–278
Chapter Google Scholar
Zhao X, Wu W, Tao C, Xu C, Zhao D, Yan R. Low-resource knowledge-grounded dialogue generation. In: Proceedings of the 8th International Conference on Learning Representations. 2020
MATH Google Scholar
Bao S, He H, Wang F, Wu H, Wang H. PLATO: pre-trained dialogue generation model with discrete latent variable. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 85–96
Chapter MATH Google Scholar
Su Y, Shu L, Mansimov E, Gupta A, Cai D, Lai Y A, Zhang Y. Multitask pre-training for plug-and-play task-oriented dialogue system. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022, 4661–4676
Chapter Google Scholar
Shi X, Che W. Combating with extremely noisy samples in weakly supervised slot filling for automatic diagnosis. Frontiers of Computer Science, 2023, 17(5): 175333
Article MATH Google Scholar
Qin L, Xu X, Wang L, Zhang Y, Che W. Modularized pre-training for end-to-end task-oriented dialogue. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 1601–1610
Article MATH Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (NSFC) (Grant Nos. 62306342, 62176076), and the Excellent Young Scientists Fund in Hunan Province (2024JJ4070). This work was also partially supported by the Natural Science Foundation of Guangdong (2023A1515012922), Shenzhen Foundational Research Funding (JCYJ20220818102415032), The Major Key Project of PCL (PCL2023A09), Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies (2022B1212010005k). We are grateful for resources from the High Performance Computing Center of Central South University. Libo Qin is the corresponding author.

Author information

These authors contributed equally to this work.

Authors and Affiliations

School of Computer Science and Engineering, Central South University, Changsha, 410083, China
Libo Qin
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China
Shijue Huang & Ruifeng Xu
Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, Harbin, 150001, China
Qiguang Chen & Wanxiang Che
Sea AI Lab, Singapore, 138522, Singapore
Qian Liu

Authors

Libo Qin
View author publications
You can also search for this author inPubMed Google Scholar
Shijue Huang
View author publications
You can also search for this author inPubMed Google Scholar
Qiguang Chen
View author publications
You can also search for this author inPubMed Google Scholar
Qian Liu
View author publications
You can also search for this author inPubMed Google Scholar
Wanxiang Che
View author publications
You can also search for this author inPubMed Google Scholar
Ruifeng Xu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Libo Qin.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Libo Qin received his PhD degree in computer science from the Harbin Institute of Technology (HIT), China. He is a professor at Central South University, China. His current research interests include natural language processing and dialogue systems.

Shijue Huang is currently working toward the master’s degree with the School of Computer Science and Technology, Harbin Institution of Technology (Shenzhen), China. His research interests include natural language processing and dialogue systems.

Qiguang Chen is a master student in Harbin Institute of Technology (HIT), China. His research fields include natural language processing and dialog systems.

Qian Liu is a Research Scientist at Sea AI Lab, Singapore. His research interests include semantic parsing and dialogue systems, and natural language processing. He has published several papers in top-tier conferences (ICLR/NeurIPS/ACL/EMNLP).

Wanxiang Che received his PhD degree in computer science from the Harbin Institute of Technology (HIT), China in 2008. He is a Full Professor in the School of Computer Science and Technology, HIT. His current research interests include natural language processing and dialogue systems.

Ruifeng Xu received the PhD degree in computer science from The Hong Kong Polytechnic University, China. He is currently a Professor with the School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), China. He has published more than 100 papers in natural language processing, sentiment analysis, and social media analysis.

Electronic Supplementary Material