skip to main content
10.1145/3644032.3644456acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Coverage-based Strategies for the Automated Synthesis of Test Scenarios for Conversational Agents

Published: 10 June 2024 Publication History

Abstract

Conversational agents - or chatbots - are increasingly used as the user interface to many software services. While open-domain chatbots like ChatGPT excel in their ability to chat about any topic, task-oriented conversational agents are designed to perform goal-oriented tasks (e.g., booking or shopping) guided by a dialogue-based user interaction, which is explicitly designed. Like any kind of software system, task-oriented conversational agents need to be properly tested to ensure their quality. For this purpose, some tools permit defining and executing conversation test cases. However, there are currently no established means to assess the coverage of the design of a task-oriented agent by a test suite, or mechanisms to automate quality test case generation ensuring the agent coverage.
To attack this problem, we propose test coverage criteria for task-oriented conversational agents, and define coverage-based strategies to synthesise test scenarios, some oriented to test case reduction. We provide an implementation of the criteria and the strategies that is independent of the agent development platform. Finally, we report on their evaluation on open-source Dialogflow and Rasa agents, and a comparison against a state-of-the-art testing tool. The experiment shows benefits in terms of test generation correctness, increased coverage and reduced testing time.

References

[1]
Eleni Adamopoulou and Lefteris Moussiades. 2020. Chatbots: History, Technology, and Applications. Machine Learning with Applications 2 (2020), 100006.
[2]
Hussam Alkaissi and Samy I McFarlane. 2023. Artificial Hallucinations in Chat-GPT: Implications in Scientific Writing. Curēus 15, 2 (2023), 4 pages.
[3]
Antonia Bertolino, Emilio Cruciani, Breno Miranda, and Roberto Verdecchia. 2022. Testing Non-Testable Programs Using Association Rules. In 3rd ACM/IEEE International Conference on Automation of Software Test. ACM/IEEE, 87--91.
[4]
Josip Bozic. 2022. Ontology-Based Metamorphic Testing for Chatbots. Softw. Qual. J. 30, 1 (2022), 227--251.
[5]
Josip Bozic, Oliver A. Tazl, and Franz Wotawa. 2019. Chatbot Testing Using AI Planning. In 2019 IEEE International Conference on Artificial Intelligence Testing (AITest). IEEE, 37--44.
[6]
Josip Bozic and Franz Wotawa. 2019. Testing Chatbots Using Metamorphic Relations. In 31st IFIP WG 6.1 Int. Conf. on Testing Softw. and Syst. (LNCS, Vol. 11812). Springer, 41--55.
[7]
M. Brambilla, J. Cabot, and M. Wimmer. 2017. Model-Driven Software Engineering in Practice, Second Edition. Morgan & Claypool Publishers.
[8]
Sergio Bravo-Santos, Esther Guerra, and Juan de Lara. 2020. Testing Chatbots with Charm. In 13th Int. Conf. on Quality of Information and Communications Technology (CCIS, Vol. 1266). Springer, 426--438.
[9]
Jordi Cabot, Loli Burgueño, Robert Clarisó, Gwendal Daniel, Jorge Perianez-Pascual, and Roberto Rodríguez-Echeverría. 2021. Testing Challenges for NLP-intensive Bots. In 3rd IEEE/ACM International Workshop on Bots in Software Engineering (BotSE@ICSE). IEEE, 31--34.
[10]
P. C. Cañizares, J. M. López-Morales, S. Pérez-Soler, E. Guerra, and J. de Lara. 2023. Measuring and clustering heterogeneous chatbot designs. ACM Trans. Softw. Eng. Methodol. (2023), 42 pages.
[11]
Richard A. DeMillo, Richard J. Lipton, and Frederick G. Sayward. 1978. Hints on Test Data Selection: Help for the Practicing Programmer. Computer 11, 4 (1978), 34--41.
[12]
Dana Doherty and Kevin Curran. 2019. Chatbots for Online Banking Services. In Web Intelligence, Vol. 17. IOS Press, 327--342.
[13]
Asbjorn Folstad and Cameron Taylor. 2021. Investigating the User Experience of Customer Service Chatbot Interaction: A Framework for Qualitative Analysis of Chatbot Dialogues. Quality and User Experience 6 (2021), 1--17.
[14]
European Committee for Electrotechnical Standardization. 2001. EN 50128: Railway Applications-Communication, Signalling and Processing Systems-Software for Railway Control and Protection Systems. Standard.
[15]
Malik Ghallab, Adele Howe, Craig A. Knoblock, Drew McDermott, Ashwin Ram, Manuela Veloso, Daniel Weld, and David Wilkins. 1998. PDDL - The Planning Domain Definition Language. Technical Report CVC TR-98-003/DCS TR-1165. Yale Center for Computational Vision and Control.
[16]
Xu Han, Michelle Zhou, Yichen Wang, Wenxi Chen, and Tom Yeh. 2023. Democratizing Chatbot Debugging: A Computational Framework for Evaluating and Explaining Inappropriate Chatbot Responses. In 5th International Conference on Conversational User Interfaces. ACM, New York, NY, USA, Article 39, 7 pages.
[17]
Jen-tse Huang, Jianping Zhang, Wenxuan Wang, Pinjia He, Yuxin Su, and Michael R. Lyu. 2022. AEON: A Method for Automatic Evaluation of NLP Test Cases. In 31th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM, New York, NY, USA, 202--214.
[18]
Leslie A. Johnson. 1998. DO-178B: Software Considerations in Airborne Systems and Equipment Certification. CrossTalk 199 (1998), 11--20.
[19]
Zixi Liu, Yang Feng, and Zhenyu Chen. 2021. DialTest: Automated Testing for Recurrent-Neural-Network-Driven Dialogue Systems. In 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM, 115--126.
[20]
Jose María López-Morales, Pablo C. Cañizares, Sara Pérez-Soler, Esther Guerra, and Juan de Lara. 2022. Asymob: A Platform for Measuring and Clustering Chatbots. In 44th IEEE/ACM International Conference on Software Engineering: Companion Proceedings (ICSE Companion). ACM/IEEE, 16--20.
[21]
Ke Mao, Mark Harman, and Yue Jia. 2016. Sapienz: Multi-objective Automated Testing for Android Applications. In 25th International Symposium on Software Testing and Analysis. ACM, 94--105. See also: https://engineering.fb.com/2018/05/02/developer-tools/sapienz-intelligent-automated-software-testing-at-scale/.
[22]
Atif M. Memon, Zebao Gao, Bao N. Nguyen, Sanjeev Dhanda, Eric Nickell, Rob Siemborski, and John Micco. 2017. Taming Google-Scale Continuous Testing. In 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). IEEE, 233--242.
[23]
Changhai Nie and Hareton Leung. 2011. A survey of combinatorial testing. ACM Comput. Surv. 43, 2, Article 11 (feb 2011), 29 pages.
[24]
OpenAI. (last accessed in 2023). https://openai.com/research/gpt-4#limitations.
[25]
Rob Palin, David Ward, Ibrahim Habli, and Roger Rivett. 2011. ISO 26262 Safety Cases: Compliance and Assurance. In 6th IET International Conference on System Safety 2011. IET, 1--6.
[26]
Sara Pérez-Soler, Esther Guerra, and Juan de Lara. 2020. Model-Driven Chatbot Development. In 39th International Conference on Conceptual Modeling (ER) (LNCS, Vol. 12400). Springer, 207--222.
[27]
Sara Pérez-Soler, Esther Guerra, and Juan de Lara. 2021. Creating and Migrating Chatbots with Conga. In 43rd IEEE/ACM International Conference on Software Engineering: Companion Proceedings (ICSE Companion). IEEE, 37--40.
[28]
Ranci Ren, John W. Castro, Silvia Teresita Acuña, and Juan de Lara. 2019. Evaluation Techniques for Chatbot Usability: A Systematic Mapping Study. Int. J. Softw. Eng. Knowl. Eng. 29, 11&12 (2019), 1673--1702.
[29]
João Sedoc, Daphne Ippolito, Arun Kirubarajan, Jai Thirani, Lyle H. Ungar, and Chris Callison-Burch. 2019. Chateval: A Tool for Chatbot Evaluation. In 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). Association for Computational Linguistics, 60--65.
[30]
Geovana Ramos Sousa Silva, Genaína Nunes Rodrigues, and Edna Dias Canedo. 2023. A Modeling Strategy for the Verification of Context-Oriented Chatbot Conversational Flows via Model Checking. Journal of Universal Computer Science 29, 7 (2023), 805--835.
[31]
Richard J. Somers, James A. Douthwaite, David J. Wagg, Neil Walkinshaw, and Robert M. Hierons. 2023. Digital-Twin-Based Testing for Cyber-Physical Systems: A Systematic Literature Review. Inf. Softw. Technol. 156 (2023), 107145.
[32]
Marisa Vasconcelos, Heloisa Candello, Claudio Pinhanez, and Thiago dos Santos. 2017. Bottester: Testing Conversational Systems with Simulated Users. In XVI Brazilian Symposium on Human Factors in Computing Systems (Joinville, Brazil). ACM, New York, NY, USA, Article 73, 4 pages.
[33]
Franz Wotawa, Lorenz Klampfl, and Ledio Jahaj. 2021. A Framework for the Automation of Testing Computer Vision Systems. In 2021 IEEE/ACM International Conference on Automation of Software Test (AST). IEEE, 121--124.
[34]
Muralidhar Yalla and Asha Sunil. 2020. AI-Driven Conversational Bot Test Automation Using Industry Specific Data Cartridges. In IEEE/ACM 1st International Conference on Automation of Software Test (AST). ACM, New York, NY, USA, 105--107.
[35]
J. D. Zamfirescu-Pereira, Heather Wei, Amy Xiao, Kitty Gu, Grace Jung, Matthew G. Lee, Bjoern Hartmann, and Qian Yang. 2023. Herding AI Cats: Lessons from Designing a Chatbot by Prompting GPT-3. In 2023 ACM Designing Interactive Systems Conference (DIS). ACM, 2206--2220.

Cited By

View all
  • (2024)Test Adequacy Criteria for Metamorphic Testing2024 IEEE 24th International Conference on Software Quality, Reliability, and Security Companion (QRS-C)10.1109/QRS-C63300.2024.00072(527-534)Online publication date: 1-Jul-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
AST '24: Proceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024)
April 2024
235 pages
ISBN:9798400705885
DOI:10.1145/3644032
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2024

Check for updates

Author Tags

  1. testing
  2. test suite generation
  3. task-oriented conversational agents

Qualifiers

  • Research-article

Funding Sources

  • Spanish MICINN

Conference

AST '24
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)59
  • Downloads (Last 6 weeks)7
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Test Adequacy Criteria for Metamorphic Testing2024 IEEE 24th International Conference on Software Quality, Reliability, and Security Companion (QRS-C)10.1109/QRS-C63300.2024.00072(527-534)Online publication date: 1-Jul-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media