Abstract
Emotional conversation plays a vital role in creating more human-like conversations. Although previous works on emotional conversation generation have achieved promising results, the issue of the speaking style inconsistency still exists. In this paper, we propose a Style-Aware Emotional Dialogue System (SEDS) to enhance speaking style consistency through detecting user’s emotions and modeling speaking styles in emotional response generation. Specifically, SEDS uses an emotion encoder to perceive the user’s emotion from multimodal inputs, and tracks speaking styles through jointly optimizing a generator that is augmented with a personalized lexicon to capture explicit word-level speaking style features. Additionally, we propose an auxiliary task, a speaking style classification task, to guide SEDS to learn the implicit form of speaking style during the training process. We construct a multimodal dialogue dataset and make the alignment and annotation to verify the effectiveness of the model. Experimental results show that our SEDS achieves a significant improvement over other strong baseline models in terms of perplexity, emotion accuracy and style consistency.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Alam, F., Danieli, M., Riccardi, G.: Annotating and modeling empathy in spoken conversations. Comput. Speech Lang. 50, 40–61 (2018)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734 (2014)
Choi, W.Y., Song, K.Y., Lee, C.W.: Convolutional attention networks for multimodal emotion recognition from speech and text data. In: Proceedings of the first Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML), pp. 28–34 (2018)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning, December 2014
Fleiss, J.L., Cohen, J.: The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ. Psychol. Measure. 33(3), 613–619 (1973)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Li, J., Galley, M., Brockett, C., Spithourakis, G., Gao, J., Dolan, B.: A persona-based neural conversation model. In: ACL, pp. 994–1003 (2016)
Prendinger, H., Mori, J., Ishizuka, M.: Using human physiology to evaluate subtle expressivity of a virtual quizmaster in a mathematical game. Int. J. Hum. Comput. Stud. 62(2), 231–245 (2005)
Qian, Q., Huang, M., Zhao, H., Xu, J., Zhu, X.: Assigning personality/profile to a chatting machine for coherent conversation generation. In: IJCAI, pp. 4279–4285 (2018)
Satt, A., Rozenberg, S., Hoory, R.: Efficient emotion recognition from speech using deep learning on spectrograms. In: INTERSPEECH, pp. 1089–1093 (2017)
Shi, W., Yu, Z.: Sentiment adaptive end-to-end dialog systems. In: ACL, pp. 1509–1519 (2018)
Song, Z., Zheng, X., Liu, L., Xu, M., Huang, X.J.: Generating responses with a specific emotion in dialog. In: ACL, pp. 3685–3695 (2019)
Sutskever, I., Vinyals, O., Le, Q.: Sequence to sequence learning with neural networks. Advances in NIPS (2014)
Vinyals, O., Le, Q.: A neural conversational model. arXiv preprint arXiv:1506.05869 (2015)
Xu, H., Zhang, H., Han, K., Wang, Y., Peng, Y., Li, X.: Learning alignment for multimodal emotion recognition from speech. Proc. Interspeech, pp. 3569–3573 (2019)
Zadeh, A.B., Liang, P.P., Poria, S., Cambria, E., Morency, L.P.: Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph. In: ACL, pp. 2236–2246 (2018)
Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., Weston, J.: Personalizing dialogue agents: i have a dog, do you have pets too. In: ACL, pp. 2204–2213 (2018)
Zhou, H., Huang, M., Zhang, T., Zhu, X., Liu, B.: Emotional chatting machine: emotional conversation generation with internal and external memory. In: AAAI (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, J., Yang, Y., Chen, C., He, L., Yu, Z. (2020). Generating Emotional Social Chatbot Responses with a Consistent Speaking Style. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12431. Springer, Cham. https://doi.org/10.1007/978-3-030-60457-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-60457-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60456-1
Online ISBN: 978-3-030-60457-8
eBook Packages: Computer ScienceComputer Science (R0)