Abstract
Currently end-to-end deep learning based open-domain dialogue systems remain black box models, making it easy to generate irrelevant contents with data-driven models. Specifically, latent variables are highly entangled with different semantics in the latent space due to the lack of priori knowledge to guide the training. To address this problem, this paper proposes to harness the generative model with a priori knowledge through a cognitive approach involving feature disentanglement. Particularly, the model integrates the guided-category knowledge and open-domain dialogue data for the training, leveraging the priori knowledge into the latent space, which enables the model to disentangle the latent variables. Besides, this paper proposes a new metric for open-domain dialogues, which can objectively evaluate the interpretability of the latent space distribution. Finally, this paper validates our model on different datasets and experimentally demonstrate that our model is able to generate higher quality and more interpretable dialogues than other models.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability Statement
The datasets generated during and analyzed during the current study are available from the corresponding author on reasonable request.
References
Moor JH (2004) The turing test: The elusive standard of artificial intelligence. Comput Linguist 30(1):115–116
Madotto A, Wu C, Fung P (2018) Mem2seq: Effectively incorporating knowledge bases into end-to-end task-oriented dialog systems. In: Proceedings of the 56th annual meeting of the association for computational linguistics, pp 1468–1478
Qin L, Xu X, Che W, Zhang Y, Liu T (2020) Dynamic fusion network for multi-domain end-to-end task-oriented dialog. In: Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020, pp 6344–6354
Liu Q, Chen Y, Chen B, Lou J, Chen Z, Zhou B, Zhang D (2020) You impress me: dialogue generation via mutual persona perception. In: Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020, pp 1417–1427
Gangadharaiah R, Narayanaswamy B (2020) Recursive template-based frame generation for task oriented dialog. In: Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020, pp 2059–2064
Saha T, Patra AP, Saha S, Bhattacharyya P (2020) Towards emotion-aided multi-modal dialogue act classification. In: Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020, pp 4361–4372
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems 27: annual conference on neural information processing systems 2014, pp 3104–3112
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems 30: annual conference on neural information processing systems, pp 5998–6008
Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: 2nd international conference on learning representations
Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. In: Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, pp 3483–3491
Wang Y, Liao J, Yu H, Leng J (2022) Semantic-aware conditional variational autoencoder for one-to-many dialogue generation. Neural Comput Appl 1–13
Shi W, Zhou H, Miao N, Li L (2020) Dispersed exponential family mixture vaes for interpretable text generation. In: Proceedings of the 37th international conference on machine learning vol 119, 8840–8851
Bowman SR, Vilnis L, Vinyals O, Dai AM, Józefowicz R, Bengio S (2016) Generating sentences from a continuous space. In: Proceedings of the 20th SIGNLL conference on computational natural language learning, pp 10–21 ( 2016)
Zhao T, Lee K, Eskénazi M (2018) Unsupervised discrete sentence representation learning for interpretable neural dialog generation. In: Proceedings of the 56th annual meeting of the association for computational linguistics, pp 1098–1107
Serban IV, Sordoni A, Lowe R, Charlin L, Pineau J, Courville AC, Bengio Y (2017) A hierarchical latent variable encoder-decoder model for generating dialogues. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4–9, 2017, San Francisco, California, USA, pp 3295– 3301
Zhao T, Zhao R, Eskénazi M (2017) Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In: Proceedings of the 55th annual meeting of the association for computational linguistics, pp 654–664
Gao J, Bi W, Liu X, Li J, Zhou G, Shi S (2019) A discrete CVAE for response generation on short-text conversation. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, pp 1898–1908
Pang B, Wu YN (2021) Latent space energy-based model of symbol-vector coupling for text generation and classification. In: Proceedings of the 38th international conference on machine learning, vol 139, pp 8359–8370
Higgins I, Matthey L, Pal A, Burgess CP, Glorot X, Botvinick MM, Mohamed S, Lerchner A (2017) beta-vae: Learning basic visual concepts with a constrained variational framework. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings
Mathieu E, Rainforth T, Siddharth N, Teh YW (2019) Disentangling disentanglement in variational autoencoders. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research 97:4402–4412
Chen S, Yan J, Su Y, Wang YF (2021) Representation decomposition for image manipulation and beyond. In: 2021 IEEE international conference on image processing, ICIP 2021, Anchorage, AK, USA, September 19-22, 2021, pp 1169–1173
Li Y, Su H, Shen X, Li W, Cao Z, Niu S (2017) Dailydialog: a manually labelled multi-turn dialogue dataset. In: Proceedings of the eighth international joint conference on natural language processing, pp 986– 995
Rashkin H, Smith EM, Li M, Boureau Y (2019) Towards empathetic open-domain conversation models: A new benchmark and dataset. In: Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, Italy, July 28–August 2, 2019, Volume 1: Long Papers, pp 5370–5381
Papineni K, Roukos S, Ward T, Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311–318
Lin C-Y (2004) Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81
Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization@ACL 2005, pp 65–72
Li J, Galley M, Brockett C, Gao J, Dolan B (2016) A diversity-promoting objective function for neural conversation models. In: The 2016 conference of the north american chapter of the association for computational linguistics: human language technologies, pp 110–119
Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1412–1421
Wang Y, Wang H, Zhang X, Chaspari T, Choe Y, Lu M (2019) An attention-aware bidirectional multi-residual recurrent neural network (abmrnn): A study about better short-term text classification. In: IEEE international conference on acoustics, speech and signal processing, ICASSP 2019, Brighton, United Kingdom, May 12–17, 2019, pp 3582–3586
Jiang Z, Zheng Y, Tan H, Tang B, Zhou H (2017) Variational deep embedding: an unsupervised and generative approach to clustering. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, pp 1965–1972
Wang Y, Zhang X, Lu M, Wang H, Choe Y (2020) Attention augmentation with multi-residual in bidirectional LSTM. Neurocomputing 385:340–347
Acknowledgements
This work was partly supported by National Key R &D Program of China (2021YFF0704100), the National Natural Science Foundation of China (62136002, 61936001 and 61876027), the Science and Technology Research Program of Chongqing Municipal Education Commission (KJQN202100627 and KJQN202100629), and the National Natural Science Foundation of Chongqing (cstc2019jcyj-cxttX0002,cstc2022ycjh-bgzxm0004), respectively.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Y., Liao, J., Yu, H. et al. Interpreting open-domain dialogue generation by disentangling latent feature representations. Neural Comput & Applic 35, 20855–20867 (2023). https://doi.org/10.1007/s00521-023-08815-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08815-3