skip to main content
10.1145/3691573.3691597acmotherconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
research-article

Enhancing Virtual Human Interactions by Designing a Real-Time Dialog Filter for Mitigating Nonsensical Responses

Published: 30 September 2024 Publication History

Abstract

Virtual Humans (VHs) are crucial in facilitating discussions on sensitive topics and training interpersonal interactions. However, conversational errors, like nonsensical responses, challenge VH simulation effectiveness. This paper explores real-time dialog filters to detect such undesired exchanges. We employ a five-step prompt design iteratively and leverage OpenAI’s GPT large language model to demonstrate feasibility. Our filter distinguishes meaningful from nonsensical responses generated by a rule-based system, achieving high F1 scores (0.84) and accuracy (0.78). Comparison with human-expert classifications validates its efficacy. Filtering nonsensical responses ensures coherent and relevant interactions, significantly enhancing efficacy. This study underscores how leveraging large language models can refine existing VH systems and improve virtual human dialogues.

References

[1]
2023. Chat-openAI. https://chat.openai.com/. (Accessed on 06/23/2023).
[2]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
[3]
Sabarish Babu, Stephen Schmugge, Raj Inugala, Srinivasa Rao, Tiffany Barnes, and Larry F Hodges. 2005. Marve: a prototype virtual human interface framework for studying human-virtual human interaction. In Intelligent Virtual Agents: 5th International Working Conference, IVA 2005, Kos, Greece, September 12-14, 2005. Proceedings 5. Springer, 120–133.
[4]
Srinivas Bangalore and Michael Johnston. 2003. Balancing data-driven and rule-based approaches in the context of a multimodal conversational system. In 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No. 03EX721). IEEE, 221–226.
[5]
Shira Barzilay, Krystel Assounga, Jacqueline Veras, Courtnie Beaubian, Sarah Bloch-Elkouby, and Igor Galynker. 2020. Assessment of near-term risk for suicide attempts using the suicide crisis inventory. Journal of affective disorders 276 (2020), 183–190.
[6]
Som S Biswas. 2023. Potential use of chat gpt in global warming. Annals of biomedical engineering 51, 6 (2023), 1126–1127.
[7]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
[8]
Carma L Bylund and Gregory Makoul. 2005. Examining empathy in medical encounters: an observational study using the empathic communication coding system. Health communication 18, 2 (2005), 123–140.
[9]
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, 2022. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).
[10]
Cora Diamond. 1981. What nonsense might be. Philosophy 56, 215 (1981), 5–22.
[11]
Igor Galynker. 2017. The suicidal crisis: Clinical guide to the assessment of imminent suicide risk. Oxford University Press.
[12]
Joseph C Giarratano and Gary Riley. 1989. Expert systems: principles and programming. Brooks/Cole Publishing Co.
[13]
Alexandre Gomes de Siqueira, Heng Yao, Anokhi Bafna, Sarah Bloch-Elkouby, Jenelle Richards, Lauren B Lloveras, Kathleen Feeney, Stephanie Morris, Erica D Musser, Benjamin Lok, 2021. Investigating the Effects of Virtual Patients’ Nonsensical Responses on Users’ Facial Expressions in Mental Health Training Scenarios. In Proceedings of the 27th ACM Symposium on Virtual Reality Software and Technology. 1–10.
[14]
Kevin A Hallgren. 2012. Computing inter-rater reliability for observational data: an overview and tutorial. Tutorials in quantitative methods for psychology 8, 1 (2012), 23.
[15]
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, 2022. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556 (2022).
[16]
Rainer Knauf, Avelino J Gonzalez, and Thomas Abel. 2002. A framework for validation of rule-based systems. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 32, 3 (2002), 281–295.
[17]
Anis Koubaa. 2023. GPT-4 vs. GPT-3.5: A concise showdown. (2023).
[18]
Janice L Krieger, Jordan M Neil, Kyle A Duke, Mohan S Zalake, Fatemeh Tavassoli, Melissa J Vilaro, Danyell S Wilson-Howard, Sarah Y Chavez, Eric B Laber, Marie Davidian, 2021. A pilot study examining the efficacy of delivering colorectal cancer screening messages via virtual health assistants. American journal of preventive medicine (2021).
[19]
Gale M Lucas, Jill Boberg, David Traum, Ron Artstein, Jonathan Gratch, Alesia Gainer, Emmanuel Johnson, Anton Leuski, and Mikio Nakano. 2018. Culture, errors, and rapport-building dialogue in social agents. In Proceedings of the 18th International Conference on intelligent virtual agents. 51–58.
[20]
Gale M Lucas, Jonathan Gratch, Aisha King, and Louis-Philippe Morency. 2014. It’s only a computer: Virtual humans increase willingness to disclose. Computers in Human Behavior 37 (2014), 94–100.
[21]
Potsawee Manakul, Adian Liusie, and Mark JF Gales. 2023. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896 (2023).
[22]
Bertalan Meskó. 2023. Prompt engineering as an important emerging skill for medical professionals: tutorial. Journal of Medical Internet Research 25 (2023), e50638.
[23]
Neelang Parghi, Lakshmi Chennapragada, Shira Barzilay, Saskia Newkirk, Brian Ahmedani, Benjamin Lok, and Igor Galynker. 2021. Assessing the predictive ability of the Suicide Crisis Inventory for near-term suicidal behavior using machine learning approaches. International journal of methods in psychiatric research 30, 1 (2021), e1863.
[24]
Albert Rizzo, Greg Reger, Karen Perlman, Barbara Rothbaum, JoAnn Difede, Rob McLay, Ken Graap, Greg Gahm, Scott Johnston, Rob Deal, 2011. Virtual reality posttraumatic stress disorder (PTSD) exposure therapy results with active duty OIF/OEF service members. (2011).
[25]
Brent Rossen and Benjamin Lok. 2012. A crowdsourcing method to develop virtual human conversational agents. International Journal of Human-Computer Studies 70, 4 (2012), 301–319.
[26]
Allison Schuck, Raffaella Calati, Shira Barzilay, Sarah Bloch-Elkouby, and Igor Galynker. 2019. Suicide Crisis Syndrome: A review of supporting evidence for a new suicide-specific diagnosis. Behavioral sciences & the law 37, 3 (2019), 223–239.
[27]
Richard Skarbez, Aaron Kotranza, Frederick P Brooks, Benjamin Lok, and Mary C Whitton. 2011. An initial exploration of conversational errors as a novel method for evaluating virtual human experiences. In 2011 IEEE Virtual Reality Conference. IEEE, 243–244.
[28]
Marina Sokolova and Guy Lapalme. 2009. A systematic analysis of performance measures for classification tasks. Information processing & management 45, 4 (2009), 427–437.
[29]
Ars Technica. 2024. You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi. https://arstechnica.com/information-technology/2023/03/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberry-pi/. (Accessed on 03/30/2024).
[30]
Sandeep A Thorat and Vishakha Jadhav. 2020. A review on implementation issues of rule-based chatbot systems. In Proceedings of the international conference on innovative computing & communications (ICICC).
[31]
Yuqiong Wang, Peter Khooshabeh, and Jonathan Gratch. 2013. Looking real and making mistakes. In International Workshop on Intelligent Virtual Agents. Springer, 339–348.
[32]
Heng Yao, Alexandre Gomes de Siqueira, Adriana Foster, Igor Galynker, and Benjamin Lok. 2020. Toward Automated Evaluation of Empathetic Responses in Virtual Human Interaction Systems for Mental Health Scenarios. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. 1–8.

Index Terms

  1. Enhancing Virtual Human Interactions by Designing a Real-Time Dialog Filter for Mitigating Nonsensical Responses

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      SVR '24: Proceedings of the 26th Symposium on Virtual and Augmented Reality
      September 2024
      346 pages
      ISBN:9798400709791
      DOI:10.1145/3691573
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 September 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Large Language Models
      2. Nonsensical Responses
      3. Real-Time Dialog Filter
      4. Virtual Human Simulation
      5. Virtual Humans

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      Conference

      SVR 2024
      SVR 2024: Symposium on Virtual and Augmented Reality
      September 30 - October 3, 2024
      Manaus, Brazil

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 47
        Total Downloads
      • Downloads (Last 12 months)47
      • Downloads (Last 6 weeks)14
      Reflects downloads up to 22 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media