research-article

Mutation Testing for Task-Oriented Chatbots

Authors:

Pablo Gómez-Abajo,

Sara Pérez-Soler,

Pablo C. Cañizares,

Juan de LaraAuthors Info & Claims

EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering

Pages 232 - 241

https://doi.org/10.1145/3661167.3661220

Published: 18 June 2024 Publication History

Abstract

Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots – like ChatGPT – can converse on any topic, task-oriented chatbots – the focus of this paper – are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.

To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.

References

[1]

P. Arcaini, A. Gargantini, and E. Riccobene. 2019. Fault-based test generation for regular expressions by mutation. Softw. Test. Verification Reliab. 29, 1-2 (2019).

[2]

M. Barboni, A. Morichetta, and A. Polini. 2022. SuMo: A mutation testing approach and tool for the Ethereum blockchain. J. Syst. Softw. 193 (2022), 111445.

Digital Library

[3]

[3] Bot Tester. last access in 2024. https://github.com/microsoftly/BotTester.

[4]

[4] Botium. last access in 2024. https://www.botium.ai/.

[5]

J. Bozic. 2022. Ontology-Based Metamorphic Testing for Chatbots. Softw. Qual. J. 30, 1 (2022), 227–251.

Digital Library

[6]

S. Bravo-Santos, E. Guerra, and J. de Lara. 2020. Testing Chatbots with Charm. In QUATIC(CCIS, Vol. 1266). Springer, 426–438.

[7]

J. Cabot 2021. Testing Challenges for NLP-intensive Bots. In BotSE@ICSE. IEEE, 31–34.

[8]

P. C. Cañizares, J. M. López-Morales, S. Pérez-Soler, E. Guerra, and J. de Lara. 2024. Measuring and clustering heterogeneous chatbot designs. ACM Trans. Softw. Eng. Methodol. 33, 4 (2024), 90:1–90:43.

Digital Library

[9]

P. C. Cañizares, A. Núñez, and M. Merayo. 2018. Mutomvo: Mutation testing framework for simulated cloud and HPC environments. J. Syst. Softw. 143 (2018), 187–207.

[10]

H. Coles, T. Laurent, C. Henard, M. Papadakis, and A. Ventresque. 2016. PIT: A Practical Mutation Testing Tool for Java (Demo). In ISSTA (Saarbrücken, Germany). ACM, 449–452. https://doi.org/10.1145/2931037.2948707

Digital Library

[11]

R. A. DeMillo, R. J. Lipton, and F. G. Sayward. 1978. Hints on Test Data Selection: Help for the Practicing Programmer. Computer 11, 4 (1978), 34–41.

Digital Library

[12]

X. Devroey 2018. Model-based mutant equivalence detection using automata language equivalence and simulations. J. Syst. Softw. 141 (2018), 1–15.

[13]

[13] Dialogflow. last access in 2024. https://dialogflow.com/.

[14]

A. Estero-Botaro, F. Palomo-Lozano, and I. Medina-Bulo. 2008. Mutation operators for WS-BPEL 2.0. In ICSSEA.

[15]

D. Cer et al.2018. Universal Sentence Encoder. CoRR abs/1803.11175 (2018). arXiv:1803.11175http://arxiv.org/abs/1803.11175

[16]

W. X. Zhao et al.2023. A Survey of Large Language Models. CoRR abs/2303.18223 (2023). https://doi.org/10.48550/ARXIV.2303.18223 arXiv:2303.18223

[17]

P. Gómez-Abajo, E. Guerra, J. de Lara, and M. Merayo. 2021. Wodel-Test: A model-based framework for language-independent mutation testing. Softw. Syst. Model. 20, 3 (2021), 767–793. https://doi.org/10.1007/s10270-020-00827-0

Digital Library

[18]

P. Gómez-Abajo, E. Guerra, J. de Lara, and M. G. Merayo. 2018. A tool for domain-independent model mutation. Sci. Comput. Program. 163 (2018), 85–92.

[19]

L. Gonzalez-Hernandez 2018. Using Mutant Stubbornness to Create Minimal and Prioritized Test Sets. In QRS. IEEE, 446–457.

[20]

E. Guerra, J. Sánchez Cuadrado, and J. de Lara. 2019. Towards Effective Mutation Testing for ATL. In MODELS. IEEE, 78–88.

[21]

X. Han 2023. Democratizing Chatbot Debugging: A Computational Framework for Evaluating and Explaining Inappropriate Chatbot Responses. In CUI. ACM, Article 39, 7 pages.

[22]

R. M. Hierons and M. G. Merayo. 2009. Mutation testing from probabilistic and stochastic finite state machines. J. Syst. Softw. 82, 11 (2009), 1804–1818.

Digital Library

[23]

Y. Jia and M. Harman. 2008. MILU: A Customizable, Runtime-Optimized Higher Order Mutation Testing Tool for the Full C Language. In TAICPART. 94–98. https://doi.org/10.1109/TAIC-PART.2008.18

Digital Library

[24]

Y. Jia and M. Harman. 2011. An Analysis and Survey of the Development of Mutation Testing. IEEE Trans. Software Eng. 37, 5 (2011), 649–678.

Digital Library

[25]

M. Kintis 2018. Detecting Trivial Mutant Equivalences via Compiler Optimisations. IEEE Trans. Software Eng. 44, 4 (2018), 308–333.

[26]

M. Kintis and N. Malevris. 2015. MEDIC: A static analysis framework for equivalent mutant identification. Inf. Softw. Technol. 68 (2015), 1–17.

Digital Library

[27]

[27] Lex. last access in 2024. https://aws.amazon.com/en/lex/.

[28]

Z. Liu, Y. Feng, and Z. Chen. 2021. DialTest: Automated Testing for Recurrent-Neural-Network-Driven Dialogue Systems. In ISSTA. ACM, 115–126.

[29]

L. Madeyski, W. Orzeszyna, R. Torkar, and M. Jozala. 2014. Overcoming the Equivalent Mutant Problem: A Systematic Literature Review and a Comparative Experiment of Second Order Mutation. IEEE Trans. Sof. Eng. 40, 1 (2014), 23–42.

Digital Library

[30]

[30] Microsoft Bot Framework. last access in 2024. https://dev.botframework.com/.

[31]

[31] OpenAI. last access in 2024. https://openai.com/.

[32]

S. Pérez-Soler, E. Guerra, and J. de Lara. 2020. Model-Driven Chatbot Development. In ER(LNCS, Vol. 12400). Springer, 207–222.

[33]

S. Pérez-Soler, S. Juárez-Puerta, E. Guerra, and J. de Lara. 2021. Choosing a Chatbot Development Tool. IEEE Softw. 38, 4 (2021), 94–103.

Digital Library

[34]

[34] Rasa. last access in 2024. https://rasa.com/.

[35]

[35] Rasa test. last access in 2024. https://rasa.com/docs/rasa/testing-your-assistant.

[36]

R. Ren, J. W. Castro, S. T. Acuña, and J. de Lara. 2019. Evaluation Techniques for Chatbot Usability: A Systematic Mapping Study. Int. J. Softw. Eng. Knowl. Eng. 29, 11&12 (2019), 1673–1702.

[37]

G. R. S. Silva, G. N. Rodrigues, and E. D. Canedo. 2023. A Modeling Strategy for the Verification of Context-Oriented Chatbot Conversational Flows via Model Checking. J. Univers. Comput. Sci. 29, 7 (2023), 805–835.

[38]

F. Tambon, F. Khomh, and I. Antoniol. 2023. A probabilistic framework for mutation testing in deep neural networks. Inf. Softw. Technol. 155 (2023), 107129.

Digital Library

[39]

E. Viganò, O. Cornejo, F. Pastore, and L. Briand. 2023. Data-Driven Mutation Analysis for Cyber-Physical Systems. IEEE Tr. Sof. Eng. 49, 4 (2023), 2182–2201.

Digital Library

[40]

[40] Watson. last access in 2024. https://www.ibm.com/cloud/watson-assistant/.

[41]

C. Wei, X. Yao, D. Gong, and H. Liu. 2021. Spectral clustering based mutant reduction for mutation testing. Inf. Softw. Technol. 132 (2021), 106502.

[42]

J. D. Zamfirescu-Pereira 2023. Herding AI Cats: Lessons from Designing a Chatbot by Prompting GPT-3. In DIS. ACM, 2206–2220.

Index Terms

Mutation Testing for Task-Oriented Chatbots
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Natural language interfaces
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Automating the Development of Task-oriented LLM-based Chatbots
CUI '24: Proceedings of the 6th ACM Conference on Conversational User Interfaces

Task-oriented chatbots are increasingly used to access all sorts of services – like booking a flight, or setting a medical appointment – through natural language conversation. There are many technologies for implementing task-oriented chatbots, ...
MutaBot: A Mutation Testing Approach for Chatbots
ICSE-Companion '24: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings

Mutation testing is a technique aimed at assessing the effectiveness of test suites by seeding artificial faults into programs. Although available for many platforms and languages, no mutation testing tool is currently available for conversational ...
Faster mutation testing inspired by test prioritization and reduction
ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis

Mutation testing is a well-known but costly approach for determining test adequacy. The central idea behind the approach is to generate mutants, which are small syntactic transformations of the program under test, and then to measure for a given test ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering

June 2024

728 pages

ISBN:9798400717017

DOI:10.1145/3661167

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Ministerio de Ciencia, Innovación y Universidades

Conference

EASE 2024

EASE 2024: 28th International Conference on Evaluation and Assessment in Software Engineering

June 18 - 21, 2024

Salerno, Italy

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
30
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)9

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents