research-article

Coverage-based Strategies for the Automated Synthesis of Test Scenarios for Conversational Agents

Authors:

Pablo C. Canizares,

Sara Perez-Soler,

Juan De LaraAuthors Info & Claims

AST '24: Proceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024)

Pages 23 - 33

https://doi.org/10.1145/3644032.3644456

Published: 10 June 2024 Publication History

Abstract

Conversational agents - or chatbots - are increasingly used as the user interface to many software services. While open-domain chatbots like ChatGPT excel in their ability to chat about any topic, task-oriented conversational agents are designed to perform goal-oriented tasks (e.g., booking or shopping) guided by a dialogue-based user interaction, which is explicitly designed. Like any kind of software system, task-oriented conversational agents need to be properly tested to ensure their quality. For this purpose, some tools permit defining and executing conversation test cases. However, there are currently no established means to assess the coverage of the design of a task-oriented agent by a test suite, or mechanisms to automate quality test case generation ensuring the agent coverage.

To attack this problem, we propose test coverage criteria for task-oriented conversational agents, and define coverage-based strategies to synthesise test scenarios, some oriented to test case reduction. We provide an implementation of the criteria and the strategies that is independent of the agent development platform. Finally, we report on their evaluation on open-source Dialogflow and Rasa agents, and a comparison against a state-of-the-art testing tool. The experiment shows benefits in terms of test generation correctness, increased coverage and reduced testing time.

References

[1]

Eleni Adamopoulou and Lefteris Moussiades. 2020. Chatbots: History, Technology, and Applications. Machine Learning with Applications 2 (2020), 100006.

[2]

Hussam Alkaissi and Samy I McFarlane. 2023. Artificial Hallucinations in Chat-GPT: Implications in Scientific Writing. Curēus 15, 2 (2023), 4 pages.

[3]

Antonia Bertolino, Emilio Cruciani, Breno Miranda, and Roberto Verdecchia. 2022. Testing Non-Testable Programs Using Association Rules. In 3rd ACM/IEEE International Conference on Automation of Software Test. ACM/IEEE, 87--91.

[4]

Josip Bozic. 2022. Ontology-Based Metamorphic Testing for Chatbots. Softw. Qual. J. 30, 1 (2022), 227--251.

Digital Library

[5]

Josip Bozic, Oliver A. Tazl, and Franz Wotawa. 2019. Chatbot Testing Using AI Planning. In 2019 IEEE International Conference on Artificial Intelligence Testing (AITest). IEEE, 37--44.

[6]

Josip Bozic and Franz Wotawa. 2019. Testing Chatbots Using Metamorphic Relations. In 31st IFIP WG 6.1 Int. Conf. on Testing Softw. and Syst. (LNCS, Vol. 11812). Springer, 41--55.

Digital Library

[7]

M. Brambilla, J. Cabot, and M. Wimmer. 2017. Model-Driven Software Engineering in Practice, Second Edition. Morgan & Claypool Publishers.

[8]

Sergio Bravo-Santos, Esther Guerra, and Juan de Lara. 2020. Testing Chatbots with Charm. In 13th Int. Conf. on Quality of Information and Communications Technology (CCIS, Vol. 1266). Springer, 426--438.

[9]

Jordi Cabot, Loli Burgueño, Robert Clarisó, Gwendal Daniel, Jorge Perianez-Pascual, and Roberto Rodríguez-Echeverría. 2021. Testing Challenges for NLP-intensive Bots. In 3rd IEEE/ACM International Workshop on Bots in Software Engineering (BotSE@ICSE). IEEE, 31--34.

[10]

P. C. Cañizares, J. M. López-Morales, S. Pérez-Soler, E. Guerra, and J. de Lara. 2023. Measuring and clustering heterogeneous chatbot designs. ACM Trans. Softw. Eng. Methodol. (2023), 42 pages.

Digital Library

[11]

Richard A. DeMillo, Richard J. Lipton, and Frederick G. Sayward. 1978. Hints on Test Data Selection: Help for the Practicing Programmer. Computer 11, 4 (1978), 34--41.

Digital Library

[12]

Dana Doherty and Kevin Curran. 2019. Chatbots for Online Banking Services. In Web Intelligence, Vol. 17. IOS Press, 327--342.

[13]

Asbjorn Folstad and Cameron Taylor. 2021. Investigating the User Experience of Customer Service Chatbot Interaction: A Framework for Qualitative Analysis of Chatbot Dialogues. Quality and User Experience 6 (2021), 1--17.

[14]

European Committee for Electrotechnical Standardization. 2001. EN 50128: Railway Applications-Communication, Signalling and Processing Systems-Software for Railway Control and Protection Systems. Standard.

[15]

Malik Ghallab, Adele Howe, Craig A. Knoblock, Drew McDermott, Ashwin Ram, Manuela Veloso, Daniel Weld, and David Wilkins. 1998. PDDL - The Planning Domain Definition Language. Technical Report CVC TR-98-003/DCS TR-1165. Yale Center for Computational Vision and Control.

[16]

Xu Han, Michelle Zhou, Yichen Wang, Wenxi Chen, and Tom Yeh. 2023. Democratizing Chatbot Debugging: A Computational Framework for Evaluating and Explaining Inappropriate Chatbot Responses. In 5th International Conference on Conversational User Interfaces. ACM, New York, NY, USA, Article 39, 7 pages.

[17]

Jen-tse Huang, Jianping Zhang, Wenxuan Wang, Pinjia He, Yuxin Su, and Michael R. Lyu. 2022. AEON: A Method for Automatic Evaluation of NLP Test Cases. In 31th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM, New York, NY, USA, 202--214.

[18]

Leslie A. Johnson. 1998. DO-178B: Software Considerations in Airborne Systems and Equipment Certification. CrossTalk 199 (1998), 11--20.

[19]

Zixi Liu, Yang Feng, and Zhenyu Chen. 2021. DialTest: Automated Testing for Recurrent-Neural-Network-Driven Dialogue Systems. In 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM, 115--126.

[20]

Jose María López-Morales, Pablo C. Cañizares, Sara Pérez-Soler, Esther Guerra, and Juan de Lara. 2022. Asymob: A Platform for Measuring and Clustering Chatbots. In 44th IEEE/ACM International Conference on Software Engineering: Companion Proceedings (ICSE Companion). ACM/IEEE, 16--20.

[21]

Ke Mao, Mark Harman, and Yue Jia. 2016. Sapienz: Multi-objective Automated Testing for Android Applications. In 25th International Symposium on Software Testing and Analysis. ACM, 94--105. See also: https://engineering.fb.com/2018/05/02/developer-tools/sapienz-intelligent-automated-software-testing-at-scale/.

[22]

Atif M. Memon, Zebao Gao, Bao N. Nguyen, Sanjeev Dhanda, Eric Nickell, Rob Siemborski, and John Micco. 2017. Taming Google-Scale Continuous Testing. In 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). IEEE, 233--242.

[23]

Changhai Nie and Hareton Leung. 2011. A survey of combinatorial testing. ACM Comput. Surv. 43, 2, Article 11 (feb 2011), 29 pages.

[24]

OpenAI. (last accessed in 2023). https://openai.com/research/gpt-4#limitations.

[25]

Rob Palin, David Ward, Ibrahim Habli, and Roger Rivett. 2011. ISO 26262 Safety Cases: Compliance and Assurance. In 6th IET International Conference on System Safety 2011. IET, 1--6.

[26]

Sara Pérez-Soler, Esther Guerra, and Juan de Lara. 2020. Model-Driven Chatbot Development. In 39th International Conference on Conceptual Modeling (ER) (LNCS, Vol. 12400). Springer, 207--222.

[27]

Sara Pérez-Soler, Esther Guerra, and Juan de Lara. 2021. Creating and Migrating Chatbots with Conga. In 43rd IEEE/ACM International Conference on Software Engineering: Companion Proceedings (ICSE Companion). IEEE, 37--40.

[28]

Ranci Ren, John W. Castro, Silvia Teresita Acuña, and Juan de Lara. 2019. Evaluation Techniques for Chatbot Usability: A Systematic Mapping Study. Int. J. Softw. Eng. Knowl. Eng. 29, 11&12 (2019), 1673--1702.

[29]

João Sedoc, Daphne Ippolito, Arun Kirubarajan, Jai Thirani, Lyle H. Ungar, and Chris Callison-Burch. 2019. Chateval: A Tool for Chatbot Evaluation. In 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). Association for Computational Linguistics, 60--65.

[30]

Geovana Ramos Sousa Silva, Genaína Nunes Rodrigues, and Edna Dias Canedo. 2023. A Modeling Strategy for the Verification of Context-Oriented Chatbot Conversational Flows via Model Checking. Journal of Universal Computer Science 29, 7 (2023), 805--835.

[31]

Richard J. Somers, James A. Douthwaite, David J. Wagg, Neil Walkinshaw, and Robert M. Hierons. 2023. Digital-Twin-Based Testing for Cyber-Physical Systems: A Systematic Literature Review. Inf. Softw. Technol. 156 (2023), 107145.

Digital Library

[32]

Marisa Vasconcelos, Heloisa Candello, Claudio Pinhanez, and Thiago dos Santos. 2017. Bottester: Testing Conversational Systems with Simulated Users. In XVI Brazilian Symposium on Human Factors in Computing Systems (Joinville, Brazil). ACM, New York, NY, USA, Article 73, 4 pages.

Digital Library

[33]

Franz Wotawa, Lorenz Klampfl, and Ledio Jahaj. 2021. A Framework for the Automation of Testing Computer Vision Systems. In 2021 IEEE/ACM International Conference on Automation of Software Test (AST). IEEE, 121--124.

[34]

Muralidhar Yalla and Asha Sunil. 2020. AI-Driven Conversational Bot Test Automation Using Industry Specific Data Cartridges. In IEEE/ACM 1st International Conference on Automation of Software Test (AST). ACM, New York, NY, USA, 105--107.

[35]

J. D. Zamfirescu-Pereira, Heather Wei, Amy Xiao, Kitty Gu, Grace Jung, Matthew G. Lee, Bjoern Hartmann, and Qian Yang. 2023. Herding AI Cats: Lessons from Designing a Chatbot by Prompting GPT-3. In 2023 ACM Designing Interactive Systems Conference (DIS). ACM, 2206--2220.

Digital Library

Cited By

Liu YLi RTao HZheng Z(2024)Test Adequacy Criteria for Metamorphic Testing2024 IEEE 24th International Conference on Software Quality, Reliability, and Security Companion (QRS-C)10.1109/QRS-C63300.2024.00072(527-534)Online publication date: 1-Jul-2024
https://doi.org/10.1109/QRS-C63300.2024.00072

Index Terms

Coverage-based Strategies for the Automated Synthesis of Test Scenarios for Conversational Agents
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Natural language interfaces
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Optimized test suites for automated testing using different optimization techniques

Automated testing mitigates the risk of test maintenance failure, selects the optimized test suite, improves efficiency and hence reduces cost and time consumption. This paper is based on the development of an automated testing tool which includes two ...
Test-Suite Reduction and Prioritization for Modified Condition/Decision Coverage

Software testing is particularly expensive for developers of high-assurance software, such as software that is produced for commercial airborne systems. One reason for this expense is the Federal Aviation Administration's requirement that test suites be ...
Generating Complete Controllable Test Suites for Distributed Testing
A test suite is m-complete for finite state machine (FSM) M if it distinguishes between M and all faulty FSMs with m states or fewer. While there are several algorithms that generate m-complete test suites, they cannot be directly used in distributed ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AST '24: Proceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024)

April 2024

235 pages

ISBN:9798400705885

DOI:10.1145/3644032

Chair:
Francesca Lonetti,
Proceedings Chair:
Antonio Guerriero
Università degli Studi di Napoli Federico II, Italy
,
Program Chair:
Mehrdad Saadatmand,
Program Co-chairs:
Christof J. Budnik
Siemens Technology, USA
,
Jenny Li
Kean University, USA

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Spanish MICINN

Conference

AST '24

Sponsor:

SIGSOFT

AST '24: 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024)

April 15 - 16, 2024

Lisbon, Portugal

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
59
Total Downloads

Downloads (Last 12 months)59
Downloads (Last 6 weeks)7

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu YLi RTao HZheng Z(2024)Test Adequacy Criteria for Metamorphic Testing2024 IEEE 24th International Conference on Software Quality, Reliability, and Security Companion (QRS-C)10.1109/QRS-C63300.2024.00072(527-534)Online publication date: 1-Jul-2024
https://doi.org/10.1109/QRS-C63300.2024.00072

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents