skip to main content
10.1145/3661167.3661220acmotherconferencesArticle/Chapter ViewAbstractPublication PageseaseConference Proceedingsconference-collections
research-article

Mutation Testing for Task-Oriented Chatbots

Published: 18 June 2024 Publication History

Abstract

Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots – like ChatGPT – can converse on any topic, task-oriented chatbots – the focus of this paper – are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.

References

[1]
P. Arcaini, A. Gargantini, and E. Riccobene. 2019. Fault-based test generation for regular expressions by mutation. Softw. Test. Verification Reliab. 29, 1-2 (2019).
[2]
M. Barboni, A. Morichetta, and A. Polini. 2022. SuMo: A mutation testing approach and tool for the Ethereum blockchain. J. Syst. Softw. 193 (2022), 111445.
[3]
[3] Bot Tester. last access in 2024. https://github.com/microsoftly/BotTester.
[4]
[4] Botium. last access in 2024. https://www.botium.ai/.
[5]
J. Bozic. 2022. Ontology-Based Metamorphic Testing for Chatbots. Softw. Qual. J. 30, 1 (2022), 227–251.
[6]
S. Bravo-Santos, E. Guerra, and J. de Lara. 2020. Testing Chatbots with Charm. In QUATIC(CCIS, Vol. 1266). Springer, 426–438.
[7]
J. Cabot 2021. Testing Challenges for NLP-intensive Bots. In BotSE@ICSE. IEEE, 31–34.
[8]
P. C. Cañizares, J. M. López-Morales, S. Pérez-Soler, E. Guerra, and J. de Lara. 2024. Measuring and clustering heterogeneous chatbot designs. ACM Trans. Softw. Eng. Methodol. 33, 4 (2024), 90:1–90:43.
[9]
P. C. Cañizares, A. Núñez, and M. Merayo. 2018. Mutomvo: Mutation testing framework for simulated cloud and HPC environments. J. Syst. Softw. 143 (2018), 187–207.
[10]
H. Coles, T. Laurent, C. Henard, M. Papadakis, and A. Ventresque. 2016. PIT: A Practical Mutation Testing Tool for Java (Demo). In ISSTA (Saarbrücken, Germany). ACM, 449–452. https://doi.org/10.1145/2931037.2948707
[11]
R. A. DeMillo, R. J. Lipton, and F. G. Sayward. 1978. Hints on Test Data Selection: Help for the Practicing Programmer. Computer 11, 4 (1978), 34–41.
[12]
X. Devroey 2018. Model-based mutant equivalence detection using automata language equivalence and simulations. J. Syst. Softw. 141 (2018), 1–15.
[13]
[13] Dialogflow. last access in 2024. https://dialogflow.com/.
[14]
A. Estero-Botaro, F. Palomo-Lozano, and I. Medina-Bulo. 2008. Mutation operators for WS-BPEL 2.0. In ICSSEA.
[15]
D. Cer et al.2018. Universal Sentence Encoder. CoRR abs/1803.11175 (2018). arXiv:1803.11175http://arxiv.org/abs/1803.11175
[16]
W. X. Zhao et al.2023. A Survey of Large Language Models. CoRR abs/2303.18223 (2023). https://doi.org/10.48550/ARXIV.2303.18223 arXiv:2303.18223
[17]
P. Gómez-Abajo, E. Guerra, J. de Lara, and M. Merayo. 2021. Wodel-Test: A model-based framework for language-independent mutation testing. Softw. Syst. Model. 20, 3 (2021), 767–793. https://doi.org/10.1007/s10270-020-00827-0
[18]
P. Gómez-Abajo, E. Guerra, J. de Lara, and M. G. Merayo. 2018. A tool for domain-independent model mutation. Sci. Comput. Program. 163 (2018), 85–92.
[19]
L. Gonzalez-Hernandez 2018. Using Mutant Stubbornness to Create Minimal and Prioritized Test Sets. In QRS. IEEE, 446–457.
[20]
E. Guerra, J. Sánchez Cuadrado, and J. de Lara. 2019. Towards Effective Mutation Testing for ATL. In MODELS. IEEE, 78–88.
[21]
X. Han 2023. Democratizing Chatbot Debugging: A Computational Framework for Evaluating and Explaining Inappropriate Chatbot Responses. In CUI. ACM, Article 39, 7 pages.
[22]
R. M. Hierons and M. G. Merayo. 2009. Mutation testing from probabilistic and stochastic finite state machines. J. Syst. Softw. 82, 11 (2009), 1804–1818.
[23]
Y. Jia and M. Harman. 2008. MILU: A Customizable, Runtime-Optimized Higher Order Mutation Testing Tool for the Full C Language. In TAICPART. 94–98. https://doi.org/10.1109/TAIC-PART.2008.18
[24]
Y. Jia and M. Harman. 2011. An Analysis and Survey of the Development of Mutation Testing. IEEE Trans. Software Eng. 37, 5 (2011), 649–678.
[25]
M. Kintis 2018. Detecting Trivial Mutant Equivalences via Compiler Optimisations. IEEE Trans. Software Eng. 44, 4 (2018), 308–333.
[26]
M. Kintis and N. Malevris. 2015. MEDIC: A static analysis framework for equivalent mutant identification. Inf. Softw. Technol. 68 (2015), 1–17.
[27]
[27] Lex. last access in 2024. https://aws.amazon.com/en/lex/.
[28]
Z. Liu, Y. Feng, and Z. Chen. 2021. DialTest: Automated Testing for Recurrent-Neural-Network-Driven Dialogue Systems. In ISSTA. ACM, 115–126.
[29]
L. Madeyski, W. Orzeszyna, R. Torkar, and M. Jozala. 2014. Overcoming the Equivalent Mutant Problem: A Systematic Literature Review and a Comparative Experiment of Second Order Mutation. IEEE Trans. Sof. Eng. 40, 1 (2014), 23–42.
[30]
[30] Microsoft Bot Framework. last access in 2024. https://dev.botframework.com/.
[31]
[31] OpenAI. last access in 2024. https://openai.com/.
[32]
S. Pérez-Soler, E. Guerra, and J. de Lara. 2020. Model-Driven Chatbot Development. In ER(LNCS, Vol. 12400). Springer, 207–222.
[33]
S. Pérez-Soler, S. Juárez-Puerta, E. Guerra, and J. de Lara. 2021. Choosing a Chatbot Development Tool. IEEE Softw. 38, 4 (2021), 94–103.
[34]
[34] Rasa. last access in 2024. https://rasa.com/.
[35]
[35] Rasa test. last access in 2024. https://rasa.com/docs/rasa/testing-your-assistant.
[36]
R. Ren, J. W. Castro, S. T. Acuña, and J. de Lara. 2019. Evaluation Techniques for Chatbot Usability: A Systematic Mapping Study. Int. J. Softw. Eng. Knowl. Eng. 29, 11&12 (2019), 1673–1702.
[37]
G. R. S. Silva, G. N. Rodrigues, and E. D. Canedo. 2023. A Modeling Strategy for the Verification of Context-Oriented Chatbot Conversational Flows via Model Checking. J. Univers. Comput. Sci. 29, 7 (2023), 805–835.
[38]
F. Tambon, F. Khomh, and I. Antoniol. 2023. A probabilistic framework for mutation testing in deep neural networks. Inf. Softw. Technol. 155 (2023), 107129.
[39]
E. Viganò, O. Cornejo, F. Pastore, and L. Briand. 2023. Data-Driven Mutation Analysis for Cyber-Physical Systems. IEEE Tr. Sof. Eng. 49, 4 (2023), 2182–2201.
[40]
[40] Watson. last access in 2024. https://www.ibm.com/cloud/watson-assistant/.
[41]
C. Wei, X. Yao, D. Gong, and H. Liu. 2021. Spectral clustering based mutant reduction for mutation testing. Inf. Softw. Technol. 132 (2021), 106502.
[42]
J. D. Zamfirescu-Pereira 2023. Herding AI Cats: Lessons from Designing a Chatbot by Prompting GPT-3. In DIS. ACM, 2206–2220.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering
June 2024
728 pages
ISBN:9798400717017
DOI:10.1145/3661167
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Botium
  2. Dialogflow
  3. Mutation testing
  4. Rasa
  5. Task-oriented chatbots

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

EASE 2024

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 30
    Total Downloads
  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)9
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media