Skip to main content

Evaluating Humorous Response Generation to Playful Shopping Requests

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13981))

Abstract

AI assistants are gradually becoming embedded in our lives, utilized for everyday tasks like shopping or music. In addition to the everyday utilization of AI assistants, many users engage them with playful shopping requests, gauging their ability to understand – or simply seeking amusement. However, these requests are often not being responded to in the same playful manner, causing dissatisfaction and even trust issues.

In this work, we focus on equipping AI assistants with the ability to respond in a playful manner to irrational shopping requests. We first evaluate several neural generation models, which lead to unsuitable results – showing that this task is non-trivial. We devise a simple, yet effective, solution, that utilizes a knowledge graph to generate template-based responses grounded with commonsense. While the commonsense-aware solution is slightly less diverse than the generative models, it provides better responses to playful requests. This emphasizes the gap in commonsense exhibited by neural language models.

N. Shapira and C. Shani—Work was done during an internship at Amazon.

Except for the first author, the rest of the authors follow the ABC of surnames.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://registry.opendata.aws/shopping-humor-generation/.

  2. 2.

    The full list is included in the code repository. T5 had 95 prompts, and GPT-2 had 89 (the prompts that were suffix-based are irrelevant to GPT-2 that attends to the prefix. Top-K = 50, Top-P = 0.95, Beam width = 10, Max length GPT-2 = 50 T5-3B = 20.

  3. 3.

    The full list of relations, templates, and filtering logic is included in the code repository.

  4. 4.

    The dataset of non-shoppable items and responses are included in the code repository.

  5. 5.

    Workers were paid 5 cents per generated non-shoppable item.

  6. 6.

    Preliminary experiments showed that annotators tended to rank responses with a discourse issue as worse than the baseline response (–1/–2).

References

  1. Amin, M., Burghardt, M.: A survey on approaches to computational humor generation. In: Proceedings of the The 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 29–41 (2020)

    Google Scholar 

  2. Binsted, K., Ritchie, G.: Computational rules for generating punning riddles (1997)

    Google Scholar 

  3. Dybala, P., Ptaszynski, M., Higuchi, S., Rzepka, R., Araki, K.: Humor prevails!-implementing a joke generator into a conversational system. In: Wobcke, W., Zhang, M. (eds.) Australasian Joint Conference on Artificial Intelligence. LNCS, vol. 5360, pp. 214–225. Springer, Cham (2008). https://doi.org/10.1007/978-3-540-89378-3_21

  4. Goldberg, K., Roeder, T., Gupta, D., Perkins, C.: Eigentaste: a constant time collaborative filtering algorithm. Inform. Retriev. 4(2), 133–151 (2001)

    Google Scholar 

  5. Goldberg, Y.: Neural network methods for natural language processing. Synth. Lect. Hum. Lang. Technol. 10(1), 1–309 (2017)

    Article  Google Scholar 

  6. Hessel, J., et al.: Do androids laugh at electric sheep? humor “understanding” benchmarks from the new yorker caption contest. arXiv preprint arXiv:2209.06293 (2022)

  7. Kirk, H.R., et al.: Bias out-of-the-box: an empirical analysis of intersectional occupational biases in popular generative language models. Adv. Neural Inform. Process. Syst. 34 (2021)

    Google Scholar 

  8. Le, M., Boureau, Y.L., Nickel, M.: Revisiting the evaluation of theory of mind through question answering. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5872–5877 (2019)

    Google Scholar 

  9. Liang, P.P., Wu, C., Morency, L.P., Salakhutdinov, R.: Towards understanding and mitigating social biases in language models. In: International Conference on Machine Learning, pp. 6565–6576. PMLR (2021)

    Google Scholar 

  10. Lin, B.Y., et al.: Commongen: A constrained text generation challenge for generative commonsense reasoning. arXiv preprint arXiv:1911.03705 (2019)

  11. Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586 (2021)

  12. Luo, F., Li, S., Yang, P., Chang, B., Sui, Z., Sun, X., et al.: Pun-gan: generative adversarial network for pun generation. arXiv preprint arXiv:1910.10950 (2019)

  13. Mehrabi, N., Zhou, P., Morstatter, F., Pujara, J., Ren, X., Galstyan, A.: Lawyers are dishonest? quantifying representational harms in commonsense knowledge resources. arXiv preprint arXiv:2103.11320 (2021)

  14. Nadeem, M., Bethke, A., Reddy, S.: Stereoset: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456 (2020)

  15. Petrovic, S., Matthews, D.: Unsupervised joke generation from big data. In: ACL (2), pp. 228–232 (2013)

    Google Scholar 

  16. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)

    Google Scholar 

  17. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019)

  18. Ren, H., Yang, Q.: Neural joke generation. Final Project Reports of Course CS224n (2017)

    Google Scholar 

  19. Sakaguchi, K., Bras, R.L., Bhagavatula, C., Choi, Y.: Winogrande: an adversarial winograd schema challenge at scale. Commun. ACM 64(9), 99–106 (2021)

    Article  Google Scholar 

  20. Sap, M., LeBras, R., Fried, D., Choi, Y.: Neural theory-of-mind? on the limits of social intelligence in large LMS. arXiv preprint arXiv:2210.13312 (2022)

  21. Sap, M., Rashkin, H., Chen, D., LeBras, R., Choi, Y.: Socialiqa: commonsense reasoning about social interactions. arXiv preprint arXiv:1904.09728 (2019)

  22. Shani, C., Libov, A., Tolmach, S., Lewin-Eytan, L., Maarek, Y., Shahaf, D.: “alexa, what do you do for fun?” characterizing playful requests with virtual assistants. arXiv preprint arXiv:2105.05571 (2021)

  23. Speer, R., Chin, J., Havasi, C.: Conceptnet 5.5: an open multilingual graph of general knowledge. In: Thirty-first AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  24. Stock, O., Strapparava, C.: Hahacronym: humorous agents for humorous acronyms (2003)

    Google Scholar 

  25. Stock, O., Strapparava, C.: Hahacronym: a computational humor system. In: Proceedings of the ACL Interactive Poster and Demonstration Sessions, pp. 113–116 (2005)

    Google Scholar 

  26. Talmor, A., et al.: Commonsenseqa 2.0: Exposing the limits of AI through gamification. arXiv preprint arXiv:2201.05320 (2022)

  27. Tevet, G., Habib, G., Shwartz, V., Berant, J.: Evaluating text gans as language models. arXiv preprint arXiv:1810.12686 (2018)

  28. Valitutti, A.: How many jokes are really funny? In: Human-Machine Interaction in Translation: Proceedings of the 8th International NLPCS Workshop, vol. 41, p. 189. Samfundslitteratur (2011)

    Google Scholar 

  29. Valitutti, A., Doucet, A., Toivanen, J.M., Toivonen, H.: Computational generation and dissection of lexical replacement humor. Natl. Lang. Eng. 22(5), 727–749 (2016)

    Article  Google Scholar 

  30. Weidinger, L., et al.: Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359 (2021)

  31. Winters, T., Nys, V., Schreye, D.D.: Automatic joke generation: learning humor from examples. In: Streitz, N., Konomi, S. (eds.) International Conference on Distributed, Ambient, and Pervasive Interactions. LNCS, vol. 10922, pp. 360–377. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91131-1_28

  32. Yu, Z., Tan, J., Wan, X.: A neural approach to pun generation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1650–1660 (2018)

    Google Scholar 

  33. Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A., Choi, Y.: Hellaswag: can a machine really finish your sentence? arXiv preprint arXiv:1905.07830 (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natalie Shapira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shapira, N., Kalinsky, O., Libov, A., Shani, C., Tolmach, S. (2023). Evaluating Humorous Response Generation to Playful Shopping Requests. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13981. Springer, Cham. https://doi.org/10.1007/978-3-031-28238-6_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-28238-6_53

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-28237-9

  • Online ISBN: 978-3-031-28238-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics