Skip to main content

Introducing Multi-modality in Persuasive Task Oriented Virtual Sales Agent

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13625))

Included in the following conference series:

  • 854 Accesses

Abstract

In recent years, the usage of virtual assistants to complete tasks like service scheduling and online shopping has increased in both popularity and need. An end user’s task goals are the main objectives of a task-oriented conversation agent, and those should be served effectively and successfully. Beside that, user satisfaction is one of the most important aspect that should be taken care of. Communication with multi-modal responses makes the conversation easier and more attractive. Responses through proper images can improve the quality of a task oriented conversation in terms of user satisfaction. Keeping these aspects in mind, we propose a framework which infuses multi-modality with an end-to-end persuasive task oriented dialogue generation module. Additionally, we create a personalised persuasive multi-modal dialogue (PPMD) corpus with slot, sentiment, and agent action annotation at turn level that contains multi-modal responses from both ends. The results and thorough analysis on this dataset show that the suggested multi-modal persuasive virtual assistant achieves better performance over traditional task-oriented frameworks in terms of user satisfaction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lei, W., Jin, X., Kan, M.Y., Ren, Z., He, X., Yin, D.: Sequicity: simplifying task-oriented dialogue systems with single sequence-to-sequence architectures. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, pp. 1437–1447. Association for Computational Linguistics (2018). https://aclanthology.org/P18-1133

  2. Liang, W., Tian, Y., Chen, C., Yu, Z.: MOSS: end-to-end dialog system framework with modular supervision. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8327–8335 (2020). https://doi.org/10.1609/aaai.v34i05.6349

  3. Yang, Y., Li, Y., Quan, X.: UBAR: towards fully end-to-end task-oriented dialog systems with GPT-2. In: AAAI (2021)

    Google Scholar 

  4. Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)

    Google Scholar 

  5. Budzianowski, P., Vulić, I.: Hello, it’s GPT-2-how can I help you? Towards the use of pretrained language models for task-oriented dialogue systems. arXiv preprint arXiv:1907.05774 (2019)

  6. Tiwari, A., et al.: A dynamic goal adapted task oriented dialogue agent. PLoS ONE 16(4), e0249030 (2021)

    Article  MathSciNet  Google Scholar 

  7. Tiwari, A., et al.: A persona aware persuasive dialogue policy for dynamic and co-operative goal setting. Expert Syst. Appl. 195, 116303 (2022)

    Article  Google Scholar 

  8. Das, A., et al.: Visual dialog. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 326–335 (2017)

    Google Scholar 

  9. Guo, D., Wang, H., Wang, M.: Dual visual attention network for visual dialog. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, 10–16 August 2019, pp. 4989–4995 (2019)

    Google Scholar 

  10. Tiwari, A., et al.: Multi-modal dialogue policy learning for dynamic and co-operative goal setting. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021)

    Google Scholar 

  11. Guo, D., Wang, H., Wang, S., Wangb, M.: Textual-visual reference-aware attention network for visual dialog. IEEE Trans. Image Process. 29, 6655–6666 (2020)

    Article  MATH  Google Scholar 

  12. Hemphill, C.T., Godfrey, J.J., Doddington, G.R.: The ATIS spoken language systems pilot corpus. In: Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, 24–27 June 1990 (1990)

    Google Scholar 

  13. Budzianowski, P., et al.: MultiWOZ-a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. arXiv preprint arXiv:1810.00278 (2018)

  14. Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., Weston, J.: Personalizing dialogue agents: I have a dog, do you have pets too? In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2204–2213 (2018)

    Google Scholar 

  15. Bordes, A., Boureau, Y.L., Weston, J.: Learning end-to-end goal-oriented dialog. arXiv preprint arXiv:1605.07683 (2016)

  16. Lewis, M., Yarats, D., Dauphin, Y., Parikh, D., Batra, D.: Deal or no deal? End-to-end learning of negotiation dialogues. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2443–2453 (2017)

    Google Scholar 

  17. Saha, A., Khapra, M., Sankaranarayanan, K.: Towards building large scale multimodal domain-aware conversation systems. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  18. Wang, X., et al.: Persuasion for good: towards a personalized persuasive dialogue system for social good. arXiv preprint arXiv:1906.06725 (2019)

  19. Baichoo, A.: Kaggle GSMArean (2017). https://www.kaggle.com/arwinneil/gsmarena-phone-dataset

  20. Wolf, T., et al.: HuggingFace’s transformers: state-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019)

  21. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)

  22. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)

    Google Scholar 

  23. Lin, C.-Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aritra Raut .

Editor information

Editors and Affiliations

Appendix

Appendix

Table 5. Intent, slot and dialogue act list of the PPMD dataset
Table 6. Examples of different persuasion strategies
Fig. 6.
figure 6

An example of a generated conversation from multi-USBAR

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Raut, A. et al. (2023). Introducing Multi-modality in Persuasive Task Oriented Virtual Sales Agent. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13625. Springer, Cham. https://doi.org/10.1007/978-3-031-30111-7_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30111-7_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30110-0

  • Online ISBN: 978-3-031-30111-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics