Abstract
Consistency identification in task-oriented dialogue (CI-ToD) can prevent inconsistent dialogue response generation, which has recently emerged as an important and growing research area. This paper takes the first step to explore a pre-training paradigm for CI-ToD. Nevertheless, pre-training for CI-ToD is non-trivial because it requires a large amount of multi-turn KB-grounded dialogues, which are extremely hard to collect. To alleviate the data scarcity problem for pre-training, we introduce a modularized pre-training framework (MPFToD), which is capable of utilizing large amounts of KB-free dialogues. Specifically, such modularization allows us to decouple CI-ToD into three sub-modules and propose three pre-training tasks including (i) query response matching pre-training; (ii) dialogue history consistent identification pre-training; and (iii) KB mask language modeling to enhance different abilities of CI-ToD model. As different sub-tasks are solved separately, MPFToD can learn from large amounts of KB-free dialogues for different modules, which are much easier to obtain. Results on the CI-ToD benchmark show that MPFToD pushes the state-of-the-art performance from 56.3% to 61.0%. Furthermore, we show its transferability with promising performance on other downstream tasks (i.e., dialog act recognition, sentiment classification and table fact checking).
Similar content being viewed by others
References
Qin L, Xie T, Huang S, Chen Q, Xu X, Che W. Don’t be contradicted with anything! CI-ToD: towards benchmarking consistency for task-oriented dialogue system. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 2357–2367
Chen Q, Zhu X, Ling Z H, Wei S, Jiang H, Inkpen D. Enhanced LSTM for natural language inference. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017, 1657–1668
Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A. Supervised learning of universal sentence representations from natural language inference data. In: Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing. 2017, 670–680
Yang R, Zhang J, Gao X, Ji F, Chen H. Simple and effective text matching with richer alignment features. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 4699–4709
Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 4171–4186
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. RoBERTa: a robustly optimized BERT pretraining approach. 2019, arXiv preprint arXiv: 1907.11692
Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 7871–7880
Qin L, Chen Q, Xie T, Liu Q, Huang S, Che W, Yu Z. CGIM: a cycle guided interactive learning model for consistency identification in task-oriented dialogue. In: Proceedings of the 29th International Conference on Computational Linguistics. 2022, 461–470
Qiu X, Sun T, Xu Y, Shao Y, Dai N, Huang X. Pre-trained models for natural language processing: a survey. Science China Technological Sciences, 2020, 63(10): 1872–1897
Chen W, Wang H, Chen J, Zhang Y, Wang H, Li S, Zhou X, Wang W Y. TabFact: a large-scale dataset for table-based fact verification. In: Proceedings of the 8th International Conference on Learning Representations. 2020
Yin P, Neubig G, Yih W T, Riedel S. TaBERT: pretraining for joint understanding of textual and tabular data. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 8413–8426
Lee S, Schulz H, Atkinson A, Gao J, Suleman K, El Asri L, Adada M, Huang M, Sharma S, Tay W, Li X. Multi-domain task-completion dialog challenge. 2019
El Asri L, Schulz H, Sharma S, Zumer J, Harris J, Fine E, Mehrotra R, Suleman K. Frames: a corpus for adding memory to goal-oriented dialogue systems. In: Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. 2017, 207–219
Mrkšić N, Seaghdha D Ó, Wen T H, Thomson B, Young S. Neural belief tracker: data-driven dialogue state tracking. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017, 1777–1788
Wen T H, Vandyke D, Mrkšić N, Gašić M, Rojas-Barahona L M, Su P H, Ultes S, Young S. A network-based end-to-end trainable task-oriented dialogue system. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. 2017, 438–449
Eric M, Krishnan L, Charette F, Manning C D. Key-value retrieval networks for task-oriented dialogue. In: Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. 2017, 37–49
Budzianowski P, Wen T H, Tseng B H, Casanueva I, Ultes S, Ramadan O, Gašić M. MultiWOZ - a large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 5016–5026
Rastogi A, Zang X, Sunkara S, Gupta R, Khaitan P. Towards scalable multi-domain conversational agents: the schema-guided dialogue dataset. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 8689–8696
Li X, Wang Y, Sun S, Panda S, Liu J, Gao J. Microsoft dialogue challenge: building end-to-end task-completion dialogue systems. 2018, arXiv preprint arXiv: 1807.11125
Byrne B, Krishnamoorthi K, Sankar C, Neelakantan A, Goodrich B, Duckworth D, Yavuz S, Dubey A, Kim K Y, Cedilnik A. Taskmaster-1: toward a realistic and diverse dialog dataset. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019, 4516–4525
Shah P, Hakkani-Tür D, Liu B, Tür G. Bootstrapping a neural conversational agent with dialogue self-play, crowdsourcing and on-line reinforcement learning. In: Proceedings of 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers). 2018, 41–51
Li Y, Su H, Shen X, Li W, Cao Z, Niu S. DailyDialog: a manually labelled multi-turn dialogue dataset. In: Proceedings of the 8th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2017, 986–995
Parikh A, Wang X, Gehrmann S, Faruqui M, Dhingra B, Yang D, Das D. ToTTo: a controlled table-to-text generation dataset. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. 1173–1186
Loshchilov I, Hutter F. Decoupled weight decay regularization. In: Proceedings of the 7th International Conference on Learning Representations. 2019
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, Le Q V. Xlnet: generalized autoregressive pretraining for language understanding. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 517
Beltagy I, Peters M E, Cohan A. Longformer: the long-document transformer. 2020, arXiv preprint arXiv: 2004.05150
Wu C S, Hoi S C H, Socher R, Xiong C. TOD-BERT: pre-trained natural language understanding for task-oriented dialogue. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020, 917–929
Gururangan S, Marasović A, Swayamdipta S, Lo K, Beltagy I, Downey D, Smith N A. Don’t stop pretraining: adapt language models to domains and tasks. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 8342–8360
Cerisara C, Jafaritazehjani S, Oluokun A, Le H T. Multi-task dialog act and sentiment recognition on mastodon. In: Proceedings of the 27th International Conference on Computational Linguistics. 2018, 745–754
Qin L, Che W, Li Y, Ni M, Liu T. Dcr-Net: a deep co-interactive relation network for joint dialog act recognition and sentiment classification. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 8665–8672
Zhuang Z, Chen Q, Ma L, Li M, Han Y, Qian Y, Bai H, Zhang W, Liu T. Through the lens of core competency: survey on evaluation of large language models. In: Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 2: Frontier Forum). 2023, 88–109
Qin L, Chen Q, Feng X, Wu Y, Zhang Y, Li Y, Li M, Che W, Yu P S. Large language models meet NLP: a survey. 2024, arXiv preprint arXiv: 2405.12819
Zhang S, Dinan E, Urbanek J, Szlam A, Kiela D, Weston J. Personalizing dialogue agents: I have a dog, do you have pets too? In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018, 2204–2213
Zheng Y, Chen G, Huang M, Liu S, Zhu X. Personalized dialogue generation with diversified traits. 2019, arXiv preprint arXiv:1901.09672
Welleck S, Weston J, Szlam A, Cho K. Dialogue natural language inference. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 3731–3741
Dziri N, Kamalloo E, Mathewson K, Zaiane O. Evaluating coherence in dialogue systems using entailment. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 3806–3812
Song H, Wang Y, Zhang W N, Zhao Z, Liu T, Liu X. Profile consistency identification for open-domain dialogue agents. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020, 6651–6662
Nie Y, Williamson M, Bansal M, Kiela D, Weston J. I like fish, especially dolphins: addressing contradictions in dialogue modeling. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021, 1699–1713
Zhang Z, Guo T, Chen M. DialogueBERT: a self-supervised learning based dialogue pre-training encoder. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021, 3647–3651
Gu J C, Tao C, Ling Z, Xu C, Geng X, Jiang D. MPC-BERT: a pre-trained language model for multi-party conversation understanding. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021, 3682–3692
Zhang Y, Sun S, Galley M, Chen Y C, Brockett C, Gao X, Gao J, Liu J, Dolan B. DIALOGPT: large-scale generative pre-training for conversational response generation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2020, 270–278
Zhao X, Wu W, Tao C, Xu C, Zhao D, Yan R. Low-resource knowledge-grounded dialogue generation. In: Proceedings of the 8th International Conference on Learning Representations. 2020
Bao S, He H, Wang F, Wu H, Wang H. PLATO: pre-trained dialogue generation model with discrete latent variable. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 85–96
Su Y, Shu L, Mansimov E, Gupta A, Cai D, Lai Y A, Zhang Y. Multitask pre-training for plug-and-play task-oriented dialogue system. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022, 4661–4676
Shi X, Che W. Combating with extremely noisy samples in weakly supervised slot filling for automatic diagnosis. Frontiers of Computer Science, 2023, 17(5): 175333
Qin L, Xu X, Wang L, Zhang Y, Che W. Modularized pre-training for end-to-end task-oriented dialogue. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 1601–1610
Acknowledgments
This work was supported by the National Natural Science Foundation of China (NSFC) (Grant Nos. 62306342, 62176076), and the Excellent Young Scientists Fund in Hunan Province (2024JJ4070). This work was also partially supported by the Natural Science Foundation of Guangdong (2023A1515012922), Shenzhen Foundational Research Funding (JCYJ20220818102415032), The Major Key Project of PCL (PCL2023A09), Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies (2022B1212010005k). We are grateful for resources from the High Performance Computing Center of Central South University. Libo Qin is the corresponding author.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.
Additional information
Libo Qin received his PhD degree in computer science from the Harbin Institute of Technology (HIT), China. He is a professor at Central South University, China. His current research interests include natural language processing and dialogue systems.
Shijue Huang is currently working toward the master’s degree with the School of Computer Science and Technology, Harbin Institution of Technology (Shenzhen), China. His research interests include natural language processing and dialogue systems.
Qiguang Chen is a master student in Harbin Institute of Technology (HIT), China. His research fields include natural language processing and dialog systems.
Qian Liu is a Research Scientist at Sea AI Lab, Singapore. His research interests include semantic parsing and dialogue systems, and natural language processing. He has published several papers in top-tier conferences (ICLR/NeurIPS/ACL/EMNLP).
Wanxiang Che received his PhD degree in computer science from the Harbin Institute of Technology (HIT), China in 2008. He is a Full Professor in the School of Computer Science and Technology, HIT. His current research interests include natural language processing and dialogue systems.
Ruifeng Xu received the PhD degree in computer science from The Hong Kong Polytechnic University, China. He is currently a Professor with the School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), China. He has published more than 100 papers in natural language processing, sentiment analysis, and social media analysis.
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Qin, L., Huang, S., Chen, Q. et al. MPFToD: a modularized pre-training framework for consistency identification in task-oriented dialogue. Front. Comput. Sci. 19, 1910351 (2025). https://doi.org/10.1007/s11704-024-3778-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-024-3778-9