Skip to main content
Log in

Semantic-aware conditional variational autoencoder for one-to-many dialogue generation

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Due to the miscellaneous ambiguity of semantics in open-domain conversation, current deep dialogue models disregard to detect potential emotional and action response features in the latent space, which leads to the general tendency to produce inaccurate and irrelevant sentences. To address this problem, we propose a semantic-aware conditional variational autoencoder that discriminates the sentiment and action responses features in the latent space for one-to-many open-domain dialogue generation. Specifically, explicit controllable variables are leveraged from the proposed module to create diverse conversational texts. This controllable variable can constrain the distribution of the latent space, disentangling the latent space features during training. Furthermore, the feature disentanglement improves the dialogue generation in terms of deep learning interpretability and text quality, which also reveals the latent features of different emotions on the logic of text generation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Turing AM (1990) Computing machinery and intelligence. In: The philosophy of artificial intelligence, pp 40–66

  2. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems 27: annual conference on neural information processing systems 2014, pp 3104–3112

  3. Yao K, Zhang L, Luo T, Du D, Wu Y (2021) Non-deterministic and emotional chatting machine: learning emotional conversation generation using conditional variational autoencoders. Neural Comput Appl 33(11):5581–5589

    Article  Google Scholar 

  4. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, pp 5998–6008

  5. Potamias RA, Siolas G, Stafylopatis A-G (2020) A transformer-based approach to irony and sarcasm detection. Neural Comput Appl 32(23):17309–17320

    Article  Google Scholar 

  6. Kingma DP, Welling M (2014) Auto-encoding variational Bayes. In: 2nd international conference on learning representations

  7. Chen M-Y, Chiang H-S, Sangaiah AK, Hsieh T-C (2020) Recurrent neural network with attention mechanism for language model. Neural Comput Appl 32(12):7915–7923

    Article  Google Scholar 

  8. Weizenbaum J (1966) Eliza-a computer program for the study of natural language communication between man and machine. Commun ACM 9(1):36–45

    Article  Google Scholar 

  9. Young S, Gašić M, Thomson B, Williams JD (2013) POMDP-based statistical spoken dialog systems: a review. Proc IEEE 101(5):1160–1179

    Article  Google Scholar 

  10. Vinyals O, Le Q (2015) A neural conversational model. Computer Science

  11. Serban I, Sordoni A, Bengio Y, Courville A, Pineau J (2016) Building end-to-end dialogue systems using generative hierarchical neural network models. In: Proceedings of the AAAI conference on artificial intelligence, vol 30

  12. Wang Y, Wang H, Zhang X, Chaspari T, Choe Y, Lu M (2019) An attention-aware bidirectional multi-residual recurrent neural network (ABMRNN): a study about better short-term text classification. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3582–3586. IEEE

  13. Li J, Galley M, Brockett C, Gao J, Dolan B (2015) A diversity-promoting objective function for neural conversation models. Computer Science

  14. Galetzka F, Rose J, Schlangen D, Lehmann J (2021) Space efficient context encoding for non-task-oriented dialogue generation with graph attention transformer. In: Proceedings of the 59th annual meeting of the Association for Computational Linguistics and the 11th international joint conference on natural language processing (volume 1: long papers), pp 7028–7041

  15. Ribeiro MT, Singh SGC (2016) “Why should i trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135–1144

  16. Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Proceedings of the 31st international conference on neural information processing systems, pp 4768–4777

  17. Zhang Q, Yang Y, Ma H, Wu YN (2019) Interpreting CNNs via decision trees. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6261–6270

  18. Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: Proceedings of the 30th international conference on neural information processing systems, pp 2180–2188

  19. KarrasT AT, Laine S (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4401–4410

  20. Chen H, Liu X, Yin D, Tang J (2017) A survey on dialogue systems: recent advances and new frontiers. ACM SIGKDD Explor Newsl 19(2):25–35

    Article  Google Scholar 

  21. Li J, Monroe W, Ritter A, Galley M, Gao J, Jurafsky D (2016) Deep reinforcement learning for dialogue generation. In: Proceedings of EMNLP

  22. Serban I, Sordoni A, Lowe R, Charlin L, Pineau J, Courville A, Bengio Y (2017) A hierarchical latent variable encoder–decoder model for generating dialogues. In: Proceedings of the AAAI conference on artificial intelligence, vol 31

  23. Shang L, Lu Z, Hang L (2015) Neural responding machine for short-text conversation. IEEE

  24. Cho K, Merrienboer BV, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. Computer Science

  25. Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2019) beta-vae: learning basic visual concepts with a constrained variational framework. In: 5th international conference on learning representations, pp 4401–4410

  26. Hu Z, Yang Z, Liang X, Salakhutdinov R, Xing EP (2017) Toward controlled generation of text. In: 34th international conference on machine learning, pp 1587–1596

  27. Wiseman S, Shieber S, Rush A (2018) Learning neural templates for text generation. In: Proceedings of the 2018 conference on empirical methods in natural language processing. Association for Computational Linguistics, Brussels, Belgium, pp 3174–3187

  28. Zhao T, Lee K EM (2018) Unsupervised discrete sentence representation learning for interpretable neural dialog generation. In: Proceedings of the 56th annual meeting of the Association for Computational Linguistics, pp 1098–1107

  29. See A, Roller S, Kiela D, Weston J (2019) What makes a good conversation? how controllable attributes affect human judgments. In: Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, volume 1 (long and short papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp 1702–1723

  30. Ficler J, Goldberg Y (2017) Controlling linguistic style aspects in neural language generation. In: Proceedings of the workshop on stylistic variation. Association for Computational Linguistics, Copenhagen, Denmark, pp 94–104

  31. Li Z, Jiang X, Shang L, Liu Q (2019) Decomposable neural paraphrase generation. In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, pp 3403–3414

  32. Sato M, Suzuki J, Shindo H, Matsumoto Y (2018) Interpretable adversarial perturbation in input embedding space for text. In: the 27th international joint conference on artificial intelligence and the 23rd European conference on artificial intelligence

  33. Pang B, Wu YN (2021) Latent space energy-based model of symbol-vector coupling for text generation and classification. In: Proceedings of the 38th international conference on machine learning, vol 139, pp 8359–8370

  34. Shi W, Zhou H, Miao N, Li L (2020) Dispersed exponential family mixture vaes for interpretable text generation. In: Proceedings of the 37th international conference on machine learning, vol 119, pp 8840–8851

  35. Chen C, Peng J, Wang F, Xu J, Wu H (2019) Generating multiple diverse responses with multi-mapping and posterior mapping selection. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI 2019, Macao, China, August 10–16, 2019, pp 4918–4924

  36. Bao S, He H, Wang F, Wu H, Wang H (2020) PLATO: pre-trained dialogue generation model with discrete latent variable. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020, pp 85–96

  37. Cui Z, Li Y, Zhang J, Cui J, Wei C, Wang B (2020) Focus-constrained attention mechanism for cvae-based response generation. In: Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16–20 November 2020, pp 2021–2030

  38. Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. In: Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, pp 3483–3491

  39. Wang Y, Zhang X, Lu M, Wang H, Choe Y (2020) Attention augmentation with multi-residual in bidirectional LSTM. Neurocomputing 385:340–347

    Article  Google Scholar 

  40. Li Y, Su H, Shen X, Li W, Cao Z, Niu S (2017) Dailydialog: a manually labelled multi-turn dialogue dataset. In: Proceedings of the eighth international joint conference on natural language processing, pp 986–995

  41. Papineni K, Roukos S, Ward T, Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 311–318

  42. Lin, CY (2004) Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81

  43. Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization@ACL 2005, pp 65–72

  44. Li J, Galley M, Brockett C, Gao J, Dolan B (2016) A diversity-promoting objective function for neural conversation models. In: The 2016 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, pp 110–119

  45. Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1412–1421

Download references

Acknowledgements

This work was partly supported by National Key R&D Program of China (2019YFB2103000), the National Natural Science Foundation of China (62136002,62102057 and 61876027), the Science and Technology Research Program of Chongqing Municipal Education Commission (KJQN202100627 and KJQN202100629), and the National Natural Science Foundation of Chongqing (cstc2019jcyj-cxttX0002), respectively.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Yu.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Liao, J., Yu, H. et al. Semantic-aware conditional variational autoencoder for one-to-many dialogue generation. Neural Comput & Applic 34, 13683–13695 (2022). https://doi.org/10.1007/s00521-022-07182-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07182-9

Keywords

Navigation