Skip to main content

Sentiment Guided Aspect Conditioned Dialogue Generation in a Multimodal System

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13185))

Abstract

Multimodality in conversations has become critical for gaining a complete grasp of the user’s intention and providing better replies to satisfy the requirements of the customers. Existing multimodal conversational systems suffer from contradictions and generic responses. User sentiments upon the different aspects of a product/service are essential to comprehend the needs of the user and respond in an informative and interactive manner. In this regard, we propose the novel task of sentiment-guided aspect controlled response generation. This task is introduced to ensure consistency and coherence with the sentiments of the users for the aspects mentioned in the ongoing dialogue for generating better responses. In our work, we design a generative framework that utilizes the sentiment information of the previous utterances in a reinforced hierarchical transformer-based network. The decoder is provided the aspect knowledge explicitly for generation. We devise task-specific rewards that guide the generation process in an end-to-end manner. The multi-domain multi-modal conversation (MDMMD) dataset, which includes both text and images, is used to validate our proposed architecture. Quantitative and qualitative analyses show that the proposed network generates consistent and diverse responses, and performs superior to the existing frameworks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://pytorch.org/.

References

  1. Agarwal, S., Dušek, O., Konstas, I., Rieser, V.: Improving context modelling in multimodal dialogue generation. In: Proceedings of the 11th International Conference on Natural Language Generation, pp. 129–134 (2018)

    Google Scholar 

  2. Agarwal, S., Dušek, O., Konstas, I., Rieser, V.: A knowledge-grounded multimodal search-based conversational agent. In: Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI, pp. 59–66 (2018)

    Google Scholar 

  3. Banerjee, S., Khapra, M.M.: Graph convolutional network with sequential attention for goal-oriented dialogue systems. Trans. Assoc. Comput. Linguist. 7, 485–500 (2019)

    Article  Google Scholar 

  4. Chauhan, H., Firdaus, M., Ekbal, A., Bhattacharyya, P.: Ordinal and attribute aware response generation in a multimodal dialogue system. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5437–5447 (2019)

    Google Scholar 

  5. Chen, H., Ren, Z., Tang, J., Zhao, Y.E., Yin, D.: Hierarchical variational memory network for dialogue generation. In: Proceedings of the 2018 World Wide Web Conference 2018, Lyon, France, 23–27 April 2018, pp. 1653–1662 (2018)

    Google Scholar 

  6. Chen, S., Beeferman, D.H., Rosenfeld, R.: Evaluation metrics for language models (1998)

    Google Scholar 

  7. Chen, X., Xu, J., Xu, B.: A working memory model for task-oriented dialog response generation. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, Volume 1: Long Papers, 28 July–2 August 2019, pp. 2687–2693 (2019)

    Google Scholar 

  8. Cui, C., Wang, W., Song, X., Huang, M., Xu, X.S., Nie, L.: User attention-guided multimodal dialog systems. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, 21–25 July, pp. 445–454 (2019)

    Google Scholar 

  9. Das, A., et al.: Visual dialog. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 326–335 (2017)

    Google Scholar 

  10. De Vries, H., Strub, F., Chandar, S., Pietquin, O., Larochelle, H., Courville, A.: GuessWhat?! Visual object discovery through multi-modal dialogue. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July, pp. 4466–4475 (2017)

    Google Scholar 

  11. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019, pp. 4171–4186 (2018)

    Google Scholar 

  12. Firdaus, M., Chauhan, H., Ekbal, A., Bhattacharyya, P.: EmoSen: generating sentiment and emotion controlled responses in a multimodal dialogue system. IEEE Trans. Affect. Comput. (2020)

    Google Scholar 

  13. Firdaus, M., Pratap Shandeelya, A., Ekbal, A.: More to diverse: generating diversified responses in a task oriented multimodal dialog system. PLoS ONE 15(11), e0241271 (2020)

    Google Scholar 

  14. Firdaus, M., Thakur, N., Ekbal, A.: MultiDM-GCN: aspect-guided response generation in multi-domain multi-modal dialogue system using graph convolution network. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP 2020, Online Event, 16–20 November 2020, pp. 2318–2328 (2020)

    Google Scholar 

  15. Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378 (1971)

    Article  Google Scholar 

  16. Gan, Z., Cheng, Y., Kholy, A.E., Li, L., Liu, J., Gao, J.: Multi-step reasoning via recurrent dual attention for visual dialog. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Volume 1: Long Papers, Florence, Italy, 28 July–2 August 2019, pp. 6463–6474 (2019)

    Google Scholar 

  17. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, Chia Laguna Resort, Sardinia, Italy, 13–15 May, pp. 249–256 (2010)

    Google Scholar 

  18. Golchha, H., Firdaus, M., Ekbal, A., Bhattacharyya, P.: Courteously yours: inducing courteous behavior in customer care responses using reinforced pointer generator network. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 851–860 (2019)

    Google Scholar 

  19. Liao, L., Ma, Y., He, X., Hong, R., Chua, T.S.: Knowledge-aware multimodal dialogue systems. In: 2018 ACM Multimedia Conference on Multimedia Conference, MM 2018, Seoul, Republic of Korea, 22–26 October, pp. 801–809. ACM (2018)

    Google Scholar 

  20. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  21. Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP (2015)

    Google Scholar 

  22. Madotto, A., Wu, C.S., Fung, P.: Mem2Seq: effectively incorporating knowledge bases into end-to-end task-oriented dialog systems. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Volume 1: Long Papers, Melbourne, Australia, 15–20 July 2018, pp. 1468–1478 (2018)

    Google Scholar 

  23. Majumder, N., Hazarika, D., Gelbukh, A., Cambria, E., Poria, S.: Multimodal sentiment analysis using hierarchical fusion with context modeling. Knowl. Based Syst. 161, 124–133 (2018)

    Google Scholar 

  24. McHugh, M.L.: Interrater reliability: the kappa statistic. Biochemia Medica 22(3), 276–282 (2012)

    Google Scholar 

  25. Mi, F., Huang, M., Zhang, J., Faltings, B.: Meta-learning for low-resource natural language generation in task-oriented dialogue systems. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, 10–16 August, pp. 3151–3157 (2019)

    Google Scholar 

  26. Mostafazadeh, N., et al.: Image-grounded conversations: multimodal context for natural question and response generation. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing, IJCNLP 2017, Volume 1: Long Papers, Taipei, Taiwan, 27 November–1 December 2017, pp. 462–472 (2017)

    Google Scholar 

  27. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 6–12 July 2002, pp. 311–318. Association for Computational Linguistics (2002)

    Google Scholar 

  28. Paulus, R., Xiong, C., Socher, R.: A deep reinforced model for abstractive summarization. CoRR arXiv:1705.04304 (2017)

  29. Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, A Meeting of SIGDAT, a Special Interest Group of the ACL, Doha, Qatar, 25–29 October 2014, pp. 1532–1543 (2014)

    Google Scholar 

  30. Pérez-Rosas, V., Mihalcea, R., Morency, L.P.: Utterance-level multimodal sentiment analysis. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, Volume 1: Long Papers, Sofia, Bulgaria, 4–9 August 2013, pp. 973–982 (2013)

    Google Scholar 

  31. Poria, S., Cambria, E., Hazarika, D., Majumder, N., Zadeh, A., Morency, L.P.: Context-dependent sentiment analysis in user-generated videos. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Volume 1: Long Papers, Vancouver, Canada, 30 July–4 August, pp. 873–883 (2017)

    Google Scholar 

  32. Qian, K., Yu, Z.: Domain adaptive dialog generation via meta learning. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Volume 1: Long Papers, Florence, Italy, 28 July–2 August 2019, pp. 2639–2649 (2019)

    Google Scholar 

  33. Reddi, S.J., Kale, S., Kumar, S.: On the convergence of Adam and beyond. In: 6th International Conference on Learning Representations, ICLR 2018, Conference Track Proceedings, Vancouver, BC, Canada, 30 April–3 May 2018 (2019)

    Google Scholar 

  34. Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., Goel, V.: Self-critical sequence training for image captioning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1179–1195 (2017)

    Google Scholar 

  35. Saha, A., Khapra, M.M., Sankaranarayanan, K.: Towards building large scale multimodal domain-aware conversation systems. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI 2018), the 30th innovative Applications of Artificial Intelligence (IAAI 2018), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI 2018), New Orleans, Louisiana, USA, 2–7 February, pp. 696–704 (2018)

    Google Scholar 

  36. Serban, I.V., Sordoni, A., Bengio, Y., Courville, A., Pineau, J.: Hierarchical neural network generative models for movie dialogues, vol. 7, no. 8. arXiv preprint arXiv:1507.04808 (2015)

  37. Serban, I.V., Sordoni, A., Bengio, Y., Courville, A., Pineau, J.: Building end-to-end dialogue systems using generative hierarchical neural network models. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, Arizona, USA, 12–17 February 2016, pp. 3776–3784 (2016)

    Google Scholar 

  38. Serban, I.V., et al.: A hierarchical latent variable encoder-decoder model for generating dialogues. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, California, USA, 4–9 February 2017, pp. 3295–3301 (2017)

    Google Scholar 

  39. Shang, L., Lu, Z., Li, H.: Neural responding machine for short-text conversation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, Volume 1: Long Papers, Beijing, China, 26–31 July 2015, pp. 1577–1586 (2015)

    Google Scholar 

  40. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings, San Diego, CA, USA, 7–9 May 2015 (2015)

    Google Scholar 

  41. Sordoni, A., et al.: A neural network approach to context-sensitive generation of conversational responses. In: NAACL (2015)

    Google Scholar 

  42. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    Google Scholar 

  43. Tian, Z., Bi, W., Li, X., Zhang, N.L.: Learning to abstract for memory-augmented conversational response generation. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, Volume 1: Long Papers, 28 July–2 August 2019, pp. 3816–3825 (2019)

    Google Scholar 

  44. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  45. Wen, T.H., et al.: Conditional generation and snapshot learning in neural dialogue systems. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, 1–4 November 2016, pp. 2153–2162 (2016)

    Google Scholar 

  46. Wen, T.H., et al.: A network-based end-to-end trainable task-oriented dialogue system. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Volume 1: Long Papers, Valencia, Spain, 3–7 April 2017, pp. 438–449 (2017)

    Google Scholar 

  47. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992). https://doi.org/10.1007/BF00992696

  48. Wolf, T., et al.: HuggingFace’s transformers: state-of-the-art natural language processing. ArXiv arXiv:1910.03771 (2019)

  49. Wu, C.S.: Learning to memorize in neural task-oriented dialogue systems. arXiv preprint arXiv:1905.07687 (2019)

  50. Wu, C.S., Socher, R., Xiong, C.: Global-to-local memory pointer networks for task-oriented dialogue. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May (2019)

    Google Scholar 

  51. Zadeh, A., Zellers, R., Pincus, E., Morency, L.P.: Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages. IEEE Intell. Syst. 31(6), 82–88 (2016)

    Google Scholar 

  52. Zhang, Y., Li, Q., Song, D., Zhang, P., Wang, P.: Quantum-inspired interactive networks for conversational sentiment analysis. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, 10–16 August 2019, pp. 5436–5442 (2019)

    Google Scholar 

  53. Zhou, H., Huang, M., Zhang, T., Zhu, X., Liu, B.: Emotional chatting machine: emotional conversation generation with internal and external memory. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018), New Orleans, Louisiana, USA, 2–7 February 2018, pp. 730–739 (2018)

    Google Scholar 

  54. Zhu, C., Zeng, M., Huang, X.: Multi-task learning for natural language generation in task-oriented dialogue. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, 3–7 November, pp. 1261–1266 (2019)

    Google Scholar 

Download references

Acknowledgement

Authors duly acknowledge the support from the Project titled “Sevak-An Intelligent Indian Language Chatbot”, Sponsored by SERB, Govt. of India. Asif Ekbal acknowledges the Young Faculty Research Fellowship (YFRF), supported by Visvesvaraya PhD scheme for Electronics and IT, Ministry of Electronics and Information Technology (MeitY), Government of India, being implemented by Digital India Corporation (formerly Media Lab Asia).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mauajama Firdaus .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Firdaus, M., Thakur, N., Ekbal, A. (2022). Sentiment Guided Aspect Conditioned Dialogue Generation in a Multimodal System. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13185. Springer, Cham. https://doi.org/10.1007/978-3-030-99736-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-99736-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99735-9

  • Online ISBN: 978-3-030-99736-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics