Skip to main content
Log in

Multi-source inverse-curriculum-based training for low-resource dialogue generation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

An effective dialogue system needs amount of training data, but the existing training data is insufficient. Although the pre-trained model has made great progress in recent years, which can alleviate the problem of low resource dialogue to a certain extent, the pre-trained model is large and difficult to deploy. How to improve the performance of dialogue model without additional annotation data and decreasing the model volume has become a new challenge. We propose a multi-source data augmentation method for low-resource dialogue generation by utilizing inverse curriculum learning (inverse CL). Firstly, we adopt three data augmentation methods, including round-trip translation, paraphrasing and pre-trained model, to generate augmentation data. Next, we propose a new training strategy based on inverse CL to utilize different augmentation data. Comparing with the baselines, our method comprehensively outperform the baselines on all evaluation metrics, which shows the effectiveness of our proposed training strategy for dialogue generation. To the best of our knowledge, this is the first systematic investigation of data augmentation in the dialogue generation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://github.com/marian-nmt/marian

  2. https://github.com/vsuthichai/paraphraser

  3. https://github.com/LHolten/DialoGPT-MMI-decoder

References

  1. Fuwei C, Qian C, Yongduan S (2021) A survey on Learning-Based approaches for modeling and classification of Human–Machine dialog systems. IEEE Trans Neural Netw Learn Syst 32(4):1418–1432. https://doi.org/10.1109/TNNLS.2020.2985588

    Article  Google Scholar 

  2. Connor S, Taghi M (2019) Khoshgoftaar, a survey on image data augmentation for deep learning. J Big Data 6(1):1–48. https://doi.org/10.1186/s40537-019-0197-0

    Google Scholar 

  3. Redmon J, Santosh D, Ross G, Ali F (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2016.91, pp 779–788

  4. Marzieh F, Arianna B, Christof M (2017) Data augmentation for low-resource neural machine translation. In: Proceedings of the 55th annual meeting of the association for computational linguistics. https://doi.org/10.18653/v1/P17-2090, vol 2, pp 567–573

  5. Xinyi W, Hieu P, Zihang D, Graham N (2018) Switchout: an efficient data augmentation algorithm for neural machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 856–861. https://doi.org/10.18653/v1/D18-1100

  6. Fei G, Jinhua Z, Lijun W, Yingce X, Tao Q, Xueqi C, Wengang Z, Tie-Yan L (2019) Soft contextual data augmentation for neural machine translation. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 5539–5544. https://doi.org/10.18653/v1/P19-1555

  7. Jason W, Kai Z (2019) EDA: easy data augmentation techniques for boosting performance on text classification tasks. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 6382–6388. https://doi.org/10.18653/v1/D19-1670

  8. Qizhe X, Zihang D, Eduard H, Thang L, Quoc L (2020) Unsupervised data augmentation for consistency training. Adv Neural Inf Process Syst :6256–6268. https://proceedings.neurips.cc/paper/2020/file/44feb0096faa8326192570788b38c1d1-Paper.pdf

  9. Raiman J, Miller J (2017) Globally normalized reader. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 1059–1069. https://doi.org/10.18653/v1/D17-1111https://doi.org/10.18653/v1/D17-1111

  10. Luque FM (2019) Atalaya at TASS 2019: data augmentation and robust embeddings for sentiment analysis. In: IberLEF@SEPLN

  11. Yan G, Li Y, Zhang S, Chen Z (2019) Data augmentation for deep learning of judgment documents. In: ISCIDE

  12. Peng B, Zhu C, Zeng M, Gao J (2020) Data augmentation for spoken language understanding via pretrained models. arXiv:2004.13952

  13. Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems. https://proceedings.neurips.cc/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf

  14. Wang WY, Yang D (2015) That’s so annoying!!! a lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using #petpeeve tweets. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 2557–2563. https://doi.org/10.18653/v1/D15-1306

  15. Xie Q, Dai Z, Hovy E, Luong M-T, Le QV (2019) Unsupervised data augmentation for consistency training. arXiv:1904.12848

  16. Zhang Y, Ge T, Sun X (2020) Parallel data augmentation for formality style transfer. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 34–40. https://doi.org/10.18653/v1/2020.acl-main.5

  17. Hou Y, Liu Y, Che W, Liu T (2018) Sequence-to-sequence data augmentation for dialogue language understanding, arXiv:1807.01554

  18. Li K, Chen C, Quan X, Ling Q, Song Y (2020) Conditional augmentation for aspect term extraction via masked sequence-to-sequence generation. arXiv:2004.14769

  19. Liu D, Gong Y, Fu J, Yan Y, Chen J, Lv J, Duan N, Zhou M (2020) Tell me how to ask again: question data augmentation with controllable rewriting in continuous space. arXiv:2010.01475

  20. Li B, Hou Y, Che W (2021) Data augmentation approaches in natural language processing: a survey. arXiv:2110.01852

  21. Wen T-H, Gašic M, Mrkšic N, Rojas-Barahona LM, Su P-H, Vandyke D, Young S (2016) Multi-domain neural network language generation for spoken dialogue systems. In: Proceedings of NAACLHLT, pp 120–129

  22. Gao S, Zhang Y, Ou Z, Yu Z (2020) Paraphrase augmented Task-Oriented dialog generation. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 639–649

  23. Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp 41–48. https://doi.org/10.1145/1553374.1553380

  24. Liu C, He S, Liu K, Zhao J (2018) Curriculum learning for natural answer generation. In: The twenty-seventh international joint conference on artificial intelligence

  25. Tay Y, Wang S, Tuan LA, Fu J, Phan MC, Yuan X, Rao J, Hui SC, Zhang A (2019) Simple and effective curriculum pointer-generator networks for reading comprehension over long narratives. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4922–4931. https://doi.org/10.18653/v1/P19-1486

  26. Platanios EA, Stretcu O, Neubig G, Póczos B, Mitchell TM (2019) Competence-based curriculum learning for neural machine translation. In: NAACL-HLT, pp 1162–1172. https://doi.org/10.18653/v1/N19-1119

  27. Matiisen T, Oliver A, Cohen T, Schulman J (2020) Teacher–student curriculum learning. IEEE Trans Neural Netw Learn Syst 31(9):3732–3740. https://doi.org/10.1109/TNNLS.2019.2934906

    Article  Google Scholar 

  28. Sennrich R, Haddow B, Birch A (2016) Improving neural machine translation models with monolingual data. In: Proceedings of the 54th annual meeting of the association for computational linguistics. https://doi.org/10.18653/v1/P16-1009, pp 86–96

  29. Edunov S, Ott M, Auli M, Grangier D (2018) Understanding back-translation at scale. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 489–500

  30. Marcin J-D, Roman G, Tomasz D et al (2018) Marian: fast neural machine translation in {C++}. In: Proceedings of the 56th annual meeting of the association for computational linguistics-system demonstrations, pp 116–121

  31. De Beaugrande RA, Dressler WU (1981) Introduction to text linguistics. Rocky Mt Mod Lang Assoc 37:103–104. https://doi.org/10.2307/1347273

    Google Scholar 

  32. Halliday M, Matthiessen CMIM, Matthiessen C (2014) An introduction to functional grammar, Routledge, pp 186–188. https://doi.org/10.1016/0024-3841(86)90084-7

  33. Wieting J, Gimpel K (2018) ParaNMT-50M: pushing the limits of paraphrastic sentence embeddings with millions of machine translations. In: Proceedings of the 56th annual meeting of the association for computational linguistics. https://doi.org/10.18653/v1/P18-1042, vol 1, pp 451–462

  34. Wieting J, Mallinson J, Gimpel K (2017) Learning paraphrastic sentence embeddings from Back-Translated bitext. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 274–285. https://doi.org/10.18653/v1/D17-1026

  35. Zhang Y, Sun S, Galley M et al (2020) DialoGPT: large-scale generative pre-training for conversational response generation. In: Proceedings of the 58th annual meeting of the association for computational linguistics: system demonstrations, 270–278. https://doi.org/10.18653/v1/2020.acl-demos.30

  36. Forgues G, Pineau J, Larchevêque JM, Tremblay R (2004) Bootstrapping dialog systems with word embeddings. In: Proceedings of the 42nd annual conference of the association for computational linguistics, pp 605–612

  37. Li J, Galley M, Brocket C, Gao J, Dolan B (2016) A diversity promoting objective function for neural conversation models. In: Proceedings of the 2016 conference of the north American chapter of the association for computational linguistics: human language technologies, pp 110–119

  38. Li Y, Su ui, Shen X, Li W, Cao Z, Niu S (2017) Dailydialog: a manually labelled multi-turn dialogue dataset. In: Proceedings of the eighth international joint conference on natural language processing, vol 1. pp 986–995

  39. Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: The 3rd international conference on learning representations, p 13

  40. Serban IV, Sordoni A, Bengio Y, Courville AC, Pineau J (2016) Building end-to-end dialogue systems using generative hierarchical neural network models. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, pp 3776–3784

  41. Serban I, Sordoni A, Lowe R, Charlin L, Pineau J, Courville A, Bengio Y (2017) A hierarchical latent variable Encoder-Decoder model for generating dialogues. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 3295–3301

Download references

Acknowledgements

The research work descried in this paper has been supported by the National Key R&D Program of China (2019YFB1405200) and the National Natural Science Foundation of China (No. 61976015, 61976016, 61876198 and 61370130). The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinan Xu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cui, F., Di, H., Huang, H. et al. Multi-source inverse-curriculum-based training for low-resource dialogue generation. Appl Intell 53, 13665–13676 (2023). https://doi.org/10.1007/s10489-022-04190-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04190-z

Keywords

Navigation