Skip to main content
Log in

Dialogue Specific Pre-training Tasks for Improved Dialogue State Tracking

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Although Pre-trained language models are widely used in dialogue state tracking, there exists little work on pre-training tasks that are designed for dialogue state tracking. In this perspective, we propose simple and effective pre-training tasks of language models that are specifically designed for dialogue state tracking. The first is modified slot prediction, which is a pre-training task that makes a binary prediction for detecting slots that change their value from the previous turn in the current turn of the dialogue. The second is next dialogue prediction, which is also a binary prediction pre-training task of finding whether a part of a given recent dialogue context is replaced with excerpts from other dialogues or not. Experimental results suggest that combing our pre-training tasks achieves a significant improvement of 4.51% point over a model trained without additional pre-training. This is impressive in that additional data is not used for the pre-training. In addition, ablation studies show how each pre-training task affects the performance. Specifically, our pre-training tasks works best when it is used in the pre-training phase rather than in the fine-tuning phase. Also, longer pre-training helps fine-tuning performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence

  2. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  3. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778

  4. Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp. 4171–4186

  5. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training. OpenAI blog

  6. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems, pp. 6000–6010

  7. Zhu Y, Kiros R, Zemel R, Salakhutdinov R, Urtasun R, Torralba A, Fidler S (2015) Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In: The IEEE International conference on computer vision (ICCV)

  8. Mehri S, Eric M, Hakkani-Tur D (2020) Dialoglue: a natural language understanding benchmark for task-oriented dialogue. arXiv preprint arXiv:2009.13570

  9. Wu C.-S, Hoi SC, Socher R, Xiong C (2020) Tod-bert: Pre-trained natural language understanding for task-oriented dialogue. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp. 917–929

  10. Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: a lite bert for self-supervised learning of language representations. In: International conference on learning representations (ICLR)

  11. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692

  12. Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21:1–67

    MathSciNet  Google Scholar 

  13. Gururangan S, Marasović A, Swayamdipta S, Lo K, Beltagy I, Downey D, Smith NA (2020) Don’t stop pretraining: adapt language models to domains and tasks. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp. 8342–8360. Association for computational linguistics, Online

  14. Zhang B, Lyu Y, Ding N, Shen T, Jia Z, Han K, Knight K (2021) A hybrid task-oriented dialog system with domain and task adaptive pretraining. In: AAAI 2021, DSTC9 Workshops

  15. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9

    Google Scholar 

  16. Bao S, He H, Wang F, Wu H, Wang H (2020) PLATO: pre-trained dialogue generation model with discrete latent variable. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp. 85–96. Association for computational linguistics, Online

  17. Cho K, van Merrienboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Empirical methods in natural language processing (EMNLP)

  18. Galley M, Brockett C, Gao X, Gao J, Dolan B (2019) Grounded response generation task at dstc7. In: Association for the advancement of artificial intelligence (AAAI) dialog system technology challenges workshop

  19. Kim S, Yang S, Kim G, Lee S-W (2020) Efficient dialogue state tracking by selectively overwriting memory. In: Proceedings of the 58th annual meeting of the association for computational linguistics (ACL), pp. 567–582. Association for computational linguistics, Online

  20. Su Y, Shu L, Mansimov E, Gupta A, Cai D, Lai Y-A, Zhang Y (2022) Multi-task pre-training for plug-and-play task-oriented dialogue system. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 4661–4676

  21. He W, Dai Y, Hui B, Yang M, Cao Z, Dong J, Huang F, Si L, Li Y (2022) Space-2: Tree-structured semi-supervised contrastive pre-training for task-oriented dialog understanding. In: Proceedings of the 29th international conference on computational linguistics, pp. 553–569

  22. He W, Dai Y, Zheng Y, Wu Y, Cao Z, Liu D, Jiang P, Yang M, Huang F, Si L, et al. (2022) Galaxy: A generative pre-trained model for task-oriented dialog with semi-supervised learning and explicit policy injection. In: Proceedings of the AAAI conference on artificial intelligence, vol. 36, pp. 10749–10757

  23. Wang Y, Zhao J, Bao J, Duan C, Wu Y, He X (2022) LUNA: Learning slot-turn alignment for dialogue state tracking. In: Proceedings of the 2022 Conference of the North American chapter of the association for computational linguistics: human language technologies, pp. 3319–3328. Association for Computational Linguistics, Seattle, USA

  24. Wang Q, Cao Y, Li P, Fu Y, Lin Z, Guo L (2022) Slot dependency modeling for zero-shot cross-domain dialogue state tracking. In: Proceedings of the 29th International conference on computational linguistics, pp. 510–520

  25. Budzianowski P, Wen T-H, Tseng B-H, Casanueva I, Ultes S, Ramadan O, Gasic M (2018) Multiwoz-a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp. 5016–5026

  26. Ye F, Feng Y, Yilmaz E (2022) Assist: towards label noise-robust dialogue state tracking. In: Findings of the association for computational linguistics: ACL 2022, pp. 2719–2731

  27. Yang G, Wang X, Yuan C (2019) Hierarchical dialog state tracking with unknown slot values. Neural Process Lett 50(2):1611–1625

    Article  Google Scholar 

  28. Le H, Socher R, Hoi SC (2019) Non-autoregressive dialog state tracking. In: International conference on learning representations

  29. Chen L, Lv B, Wang C, Zhu S, Tan B, Yu K (2020) Schema-guided multi-domain dialogue state tracking with graph attention neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 7521–7528

  30. Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018. https://openreview.net/forum?id=rJXMpikCZ

  31. Zheng Q, Yang M, Yang J, Zhang Q, Zhang X (2018) Improvement of generalization ability of deep CNN via implicit regularization in two-stage training process. IEEE Access 6:15844–15869

    Article  Google Scholar 

  32. Zheng Q, Zhao P, Li Y, Wang H, Yang Y (2021) Spectrum interference-based two-level data augmentation method in deep learning for automatic modulation classification. Neural Comp Appl 33(13):7723–7745

    Article  Google Scholar 

  33. Zheng Q, Zhao P, Zhang D, Wang H (2021) Mr-dcae: manifold regularization-based deep convolutional autoencoder for unauthorized broadcasting identification. Int J Intell Sys 36(12):7204–7238

    Article  Google Scholar 

  34. Jin B, Cruz L, Gonçalves N (2020) Deep facial diagnosis: deep transfer learning from face recognition to facial diagnosis. IEEE Access 8:123649–123661

    Article  Google Scholar 

  35. Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415

  36. Zhu Y, Kiros R, Zemel R, Salakhutdinov R, Urtasun R, Torralba A, Fidler S (2015) Movies: towards story-like visual explanations by watching movies and reading books-yukun zhu. In: Proceedings of the IEEE international conference on computer vision, pp. 19–27

  37. Eric M, Goel R, Paul S, Sethi A, Agarwal S, Gao S, Hakkani-Tür D (2019) Multiwoz 2.1: Multi-domain dialogue state corrections and state tracking baselines

  38. Wu C-S, Madotto A, Hosseini-Asl E, Xiong C, Socher R, Fung P (2019) Transferable multi-domain state generator for task-oriented dialogue systems. In: Proceedings of the 57th annual meeting of the association for computational linguistics (ACL), pp. 808–819

  39. Wolf T, Chaumond J, Debut L, Sanh V, Delangue C, Moi A, Cistac P, Funtowicz M, Davison J, Shleifer S, et al. (2020) Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations (EMNLP), pp. 38–45

  40. Loshchilov I, Hutter F (2018) Decoupled weight decay regularization. In: International conference on learning representations (ICLR)

  41. Rastogi A, Zang X, Sunkara S, Gupta R, Khaitan P (2020) Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 8689–8696

  42. Le H, Sahoo D, Liu C, Chen NF, Hoi SC (2019) End-to-end multi-domain task-oriented dialogue systems with multi-level neural belief tracker. In: International conference on learning representations (ICLR) 2020 conference withdrawn submission

  43. Zhou L, Small K (2019) Multi-domain dialogue state tracking as dynamic knowledge graph enhanced question answering. In: Conversational AI Workshop, NeurIPS 2019

  44. Zhang J, Hashimoto K, Wu C-S, Wang Y, Yu P, Socher R, Xiong C (2020) Find or classify? dual strategy for slot-value predictions on multi-domain dialog state tracking. In: Proceedings of the ninth joint conference on lexical and computational semantics. Association for computational linguistics, Barcelona, Spain (Online), pp. 154–167

  45. Heck M, van Niekerk C, Lubis N, Geishauser C, Lin H-C, Moresi M, Gasic M (2020) TripPy: A triple copy strategy for value independent neural dialog state tracking. In: Proceedings of the 21th annual meeting of the special interest group on discourse and dialogue, Association for Computational Linguistics, 1st virtual meeting, pp. 35–44

  46. Hosseini-Asl E, McCann B, Wu C-S, Yavuz S, Socher R (2020) A simple language model for task-oriented dialogue. Adv Neural Inform Process Sys 33:20179–20191

    Google Scholar 

  47. LI S, Yavuz S, Hashimoto K, Li J, Niu T, Rajani N, Yan X, Zhou Y, Xiong C (2020) Coco: Controllable counterfactuals for evaluating dialogue state trackers. In: International conference on learning representations (ICLR)

  48. Dai Y, Li H, Li Y, Sun J, Huang F, Si L, Zhu X (2021) Preview, attend and review: Schema-aware curriculum learning for multi-domain dialog state tracking. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Short Papers), pp 879–885

Download references

Acknowledgements

This work was supported by the Korean National Police Agency [Pol-Bot Development for Conversational Police Knowledge Services / PR09-01-000-20] and Institute of Information & Communications Technology Planning & Evaluation(IITP) (Project No.: 2021-0-00469).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Misuk Kim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Convergence Analysis

Appendix A: Convergence Analysis

See Fig. 3.

Fig. 3
figure 3

Loss graph for the training and validation data during the fine-tuning phase of our proposed model

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

An, J., Kim, M. Dialogue Specific Pre-training Tasks for Improved Dialogue State Tracking. Neural Process Lett 55, 7761–7776 (2023). https://doi.org/10.1007/s11063-023-11283-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-023-11283-4

Keywords

Navigation