Skip to main content

Advertisement

Log in

MPFToD: a modularized pre-training framework for consistency identification in task-oriented dialogue

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Consistency identification in task-oriented dialogue (CI-ToD) can prevent inconsistent dialogue response generation, which has recently emerged as an important and growing research area. This paper takes the first step to explore a pre-training paradigm for CI-ToD. Nevertheless, pre-training for CI-ToD is non-trivial because it requires a large amount of multi-turn KB-grounded dialogues, which are extremely hard to collect. To alleviate the data scarcity problem for pre-training, we introduce a modularized pre-training framework (MPFToD), which is capable of utilizing large amounts of KB-free dialogues. Specifically, such modularization allows us to decouple CI-ToD into three sub-modules and propose three pre-training tasks including (i) query response matching pre-training; (ii) dialogue history consistent identification pre-training; and (iii) KB mask language modeling to enhance different abilities of CI-ToD model. As different sub-tasks are solved separately, MPFToD can learn from large amounts of KB-free dialogues for different modules, which are much easier to obtain. Results on the CI-ToD benchmark show that MPFToD pushes the state-of-the-art performance from 56.3% to 61.0%. Furthermore, we show its transferability with promising performance on other downstream tasks (i.e., dialog act recognition, sentiment classification and table fact checking).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Qin L, Xie T, Huang S, Chen Q, Xu X, Che W. Don’t be contradicted with anything! CI-ToD: towards benchmarking consistency for task-oriented dialogue system. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 2357–2367

    Chapter  MATH  Google Scholar 

  2. Chen Q, Zhu X, Ling Z H, Wei S, Jiang H, Inkpen D. Enhanced LSTM for natural language inference. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017, 1657–1668

    Chapter  MATH  Google Scholar 

  3. Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A. Supervised learning of universal sentence representations from natural language inference data. In: Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing. 2017, 670–680

    Google Scholar 

  4. Yang R, Zhang J, Gao X, Ji F, Chen H. Simple and effective text matching with richer alignment features. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 4699–4709

    Chapter  MATH  Google Scholar 

  5. Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 4171–4186

    MATH  Google Scholar 

  6. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. RoBERTa: a robustly optimized BERT pretraining approach. 2019, arXiv preprint arXiv: 1907.11692

    Google Scholar 

  7. Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 7871–7880

    Chapter  Google Scholar 

  8. Qin L, Chen Q, Xie T, Liu Q, Huang S, Che W, Yu Z. CGIM: a cycle guided interactive learning model for consistency identification in task-oriented dialogue. In: Proceedings of the 29th International Conference on Computational Linguistics. 2022, 461–470

    MATH  Google Scholar 

  9. Qiu X, Sun T, Xu Y, Shao Y, Dai N, Huang X. Pre-trained models for natural language processing: a survey. Science China Technological Sciences, 2020, 63(10): 1872–1897

    Article  MATH  Google Scholar 

  10. Chen W, Wang H, Chen J, Zhang Y, Wang H, Li S, Zhou X, Wang W Y. TabFact: a large-scale dataset for table-based fact verification. In: Proceedings of the 8th International Conference on Learning Representations. 2020

    MATH  Google Scholar 

  11. Yin P, Neubig G, Yih W T, Riedel S. TaBERT: pretraining for joint understanding of textual and tabular data. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 8413–8426

    Chapter  Google Scholar 

  12. Lee S, Schulz H, Atkinson A, Gao J, Suleman K, El Asri L, Adada M, Huang M, Sharma S, Tay W, Li X. Multi-domain task-completion dialog challenge. 2019

    Google Scholar 

  13. El Asri L, Schulz H, Sharma S, Zumer J, Harris J, Fine E, Mehrotra R, Suleman K. Frames: a corpus for adding memory to goal-oriented dialogue systems. In: Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. 2017, 207–219

    Google Scholar 

  14. Mrkšić N, Seaghdha D Ó, Wen T H, Thomson B, Young S. Neural belief tracker: data-driven dialogue state tracking. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017, 1777–1788

    Chapter  Google Scholar 

  15. Wen T H, Vandyke D, Mrkšić N, Gašić M, Rojas-Barahona L M, Su P H, Ultes S, Young S. A network-based end-to-end trainable task-oriented dialogue system. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. 2017, 438–449

    Google Scholar 

  16. Eric M, Krishnan L, Charette F, Manning C D. Key-value retrieval networks for task-oriented dialogue. In: Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. 2017, 37–49

    MATH  Google Scholar 

  17. Budzianowski P, Wen T H, Tseng B H, Casanueva I, Ultes S, Ramadan O, Gašić M. MultiWOZ - a large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 5016–5026

    Chapter  Google Scholar 

  18. Rastogi A, Zang X, Sunkara S, Gupta R, Khaitan P. Towards scalable multi-domain conversational agents: the schema-guided dialogue dataset. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 8689–8696

    MATH  Google Scholar 

  19. Li X, Wang Y, Sun S, Panda S, Liu J, Gao J. Microsoft dialogue challenge: building end-to-end task-completion dialogue systems. 2018, arXiv preprint arXiv: 1807.11125

    Google Scholar 

  20. Byrne B, Krishnamoorthi K, Sankar C, Neelakantan A, Goodrich B, Duckworth D, Yavuz S, Dubey A, Kim K Y, Cedilnik A. Taskmaster-1: toward a realistic and diverse dialog dataset. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019, 4516–4525

    Google Scholar 

  21. Shah P, Hakkani-Tür D, Liu B, Tür G. Bootstrapping a neural conversational agent with dialogue self-play, crowdsourcing and on-line reinforcement learning. In: Proceedings of 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers). 2018, 41–51

    MATH  Google Scholar 

  22. Li Y, Su H, Shen X, Li W, Cao Z, Niu S. DailyDialog: a manually labelled multi-turn dialogue dataset. In: Proceedings of the 8th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2017, 986–995

    MATH  Google Scholar 

  23. Parikh A, Wang X, Gehrmann S, Faruqui M, Dhingra B, Yang D, Das D. ToTTo: a controlled table-to-text generation dataset. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. 1173–1186

    Chapter  Google Scholar 

  24. Loshchilov I, Hutter F. Decoupled weight decay regularization. In: Proceedings of the 7th International Conference on Learning Representations. 2019

    MATH  Google Scholar 

  25. Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, Le Q V. Xlnet: generalized autoregressive pretraining for language understanding. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 517

    MATH  Google Scholar 

  26. Beltagy I, Peters M E, Cohan A. Longformer: the long-document transformer. 2020, arXiv preprint arXiv: 2004.05150

    MATH  Google Scholar 

  27. Wu C S, Hoi S C H, Socher R, Xiong C. TOD-BERT: pre-trained natural language understanding for task-oriented dialogue. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020, 917–929

    Chapter  MATH  Google Scholar 

  28. Gururangan S, Marasović A, Swayamdipta S, Lo K, Beltagy I, Downey D, Smith N A. Don’t stop pretraining: adapt language models to domains and tasks. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 8342–8360

    Chapter  Google Scholar 

  29. Cerisara C, Jafaritazehjani S, Oluokun A, Le H T. Multi-task dialog act and sentiment recognition on mastodon. In: Proceedings of the 27th International Conference on Computational Linguistics. 2018, 745–754

    MATH  Google Scholar 

  30. Qin L, Che W, Li Y, Ni M, Liu T. Dcr-Net: a deep co-interactive relation network for joint dialog act recognition and sentiment classification. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 8665–8672

    MATH  Google Scholar 

  31. Zhuang Z, Chen Q, Ma L, Li M, Han Y, Qian Y, Bai H, Zhang W, Liu T. Through the lens of core competency: survey on evaluation of large language models. In: Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 2: Frontier Forum). 2023, 88–109

    Google Scholar 

  32. Qin L, Chen Q, Feng X, Wu Y, Zhang Y, Li Y, Li M, Che W, Yu P S. Large language models meet NLP: a survey. 2024, arXiv preprint arXiv: 2405.12819

    MATH  Google Scholar 

  33. Zhang S, Dinan E, Urbanek J, Szlam A, Kiela D, Weston J. Personalizing dialogue agents: I have a dog, do you have pets too? In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018, 2204–2213

    Chapter  Google Scholar 

  34. Zheng Y, Chen G, Huang M, Liu S, Zhu X. Personalized dialogue generation with diversified traits. 2019, arXiv preprint arXiv:1901.09672

    MATH  Google Scholar 

  35. Welleck S, Weston J, Szlam A, Cho K. Dialogue natural language inference. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 3731–3741

    Chapter  Google Scholar 

  36. Dziri N, Kamalloo E, Mathewson K, Zaiane O. Evaluating coherence in dialogue systems using entailment. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 3806–3812

    Google Scholar 

  37. Song H, Wang Y, Zhang W N, Zhao Z, Liu T, Liu X. Profile consistency identification for open-domain dialogue agents. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020, 6651–6662

    Chapter  MATH  Google Scholar 

  38. Nie Y, Williamson M, Bansal M, Kiela D, Weston J. I like fish, especially dolphins: addressing contradictions in dialogue modeling. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021, 1699–1713

    Google Scholar 

  39. Zhang Z, Guo T, Chen M. DialogueBERT: a self-supervised learning based dialogue pre-training encoder. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021, 3647–3651

    Chapter  MATH  Google Scholar 

  40. Gu J C, Tao C, Ling Z, Xu C, Geng X, Jiang D. MPC-BERT: a pre-trained language model for multi-party conversation understanding. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021, 3682–3692

    Google Scholar 

  41. Zhang Y, Sun S, Galley M, Chen Y C, Brockett C, Gao X, Gao J, Liu J, Dolan B. DIALOGPT: large-scale generative pre-training for conversational response generation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2020, 270–278

    Chapter  Google Scholar 

  42. Zhao X, Wu W, Tao C, Xu C, Zhao D, Yan R. Low-resource knowledge-grounded dialogue generation. In: Proceedings of the 8th International Conference on Learning Representations. 2020

    MATH  Google Scholar 

  43. Bao S, He H, Wang F, Wu H, Wang H. PLATO: pre-trained dialogue generation model with discrete latent variable. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 85–96

    Chapter  MATH  Google Scholar 

  44. Su Y, Shu L, Mansimov E, Gupta A, Cai D, Lai Y A, Zhang Y. Multitask pre-training for plug-and-play task-oriented dialogue system. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022, 4661–4676

    Chapter  Google Scholar 

  45. Shi X, Che W. Combating with extremely noisy samples in weakly supervised slot filling for automatic diagnosis. Frontiers of Computer Science, 2023, 17(5): 175333

    Article  MATH  Google Scholar 

  46. Qin L, Xu X, Wang L, Zhang Y, Che W. Modularized pre-training for end-to-end task-oriented dialogue. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 1601–1610

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (NSFC) (Grant Nos. 62306342, 62176076), and the Excellent Young Scientists Fund in Hunan Province (2024JJ4070). This work was also partially supported by the Natural Science Foundation of Guangdong (2023A1515012922), Shenzhen Foundational Research Funding (JCYJ20220818102415032), The Major Key Project of PCL (PCL2023A09), Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies (2022B1212010005k). We are grateful for resources from the High Performance Computing Center of Central South University. Libo Qin is the corresponding author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Libo Qin.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Libo Qin received his PhD degree in computer science from the Harbin Institute of Technology (HIT), China. He is a professor at Central South University, China. His current research interests include natural language processing and dialogue systems.

Shijue Huang is currently working toward the master’s degree with the School of Computer Science and Technology, Harbin Institution of Technology (Shenzhen), China. His research interests include natural language processing and dialogue systems.

Qiguang Chen is a master student in Harbin Institute of Technology (HIT), China. His research fields include natural language processing and dialog systems.

Qian Liu is a Research Scientist at Sea AI Lab, Singapore. His research interests include semantic parsing and dialogue systems, and natural language processing. He has published several papers in top-tier conferences (ICLR/NeurIPS/ACL/EMNLP).

Wanxiang Che received his PhD degree in computer science from the Harbin Institute of Technology (HIT), China in 2008. He is a Full Professor in the School of Computer Science and Technology, HIT. His current research interests include natural language processing and dialogue systems.

Ruifeng Xu received the PhD degree in computer science from The Hong Kong Polytechnic University, China. He is currently a Professor with the School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), China. He has published more than 100 papers in natural language processing, sentiment analysis, and social media analysis.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, L., Huang, S., Chen, Q. et al. MPFToD: a modularized pre-training framework for consistency identification in task-oriented dialogue. Front. Comput. Sci. 19, 1910351 (2025). https://doi.org/10.1007/s11704-024-3778-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-024-3778-9

Keywords