skip to main content
10.1145/3661167.3661273acmotherconferencesArticle/Chapter ViewAbstractPublication PageseaseConference Proceedingsconference-collections
research-article
Open access

An Empirical Study on How Large Language Models Impact Software Testing Learning

Published: 18 June 2024 Publication History

Abstract

Software testing is a challenging topic in software engineering education and requires creative approaches to engage learners. For example, the Code Defenders game has students compete over a Java class under test by writing effective tests and mutants. While such gamified approaches deal with problems of motivation and engagement, students may nevertheless require help to put testing concepts into practice. The recent widespread diffusion of Generative AI and Large Language Models raises the question of whether and how these disruptive technologies could address this problem, for example, by providing explanations of unclear topics and guidance for writing tests. However, such technologies might also be misused or produce inaccurate answers, which would negatively impact learning. To shed more light on this situation, we conducted the first empirical study investigating how students learn and practice new software testing concepts in the context of the Code Defenders testing game, supported by a smart assistant based on a widely known, commercial Large Language Model. Our study shows that students had unrealistic expectations about the smart assistant, “blindly” trusting any output it generated, and often trying to use it to obtain solutions for testing exercises directly. Consequently, students who resorted to the smart assistant more often were less effective and efficient than those who did not. For instance, they wrote 8.6% fewer tests, and their tests were not useful in 78.0% of the cases. We conclude that giving unrestricted and unguided access to Large Language Models might generally impair learning. Thus, we believe our study helps to raise awareness about the implications of using Generative AI and Large Language Models in Computer Science Education and provides guidance towards developing better and smarter learning tools.

References

[1]
Marian Daun and Jennifer Brings. 2023. How ChatGPT Will Change Software Engineering Education. Association for Computing Machinery.
[2]
Gabriela Martins de Jesus, Fabiano Cutigi Ferrari, Daniel de Paula Porto, and Sandra Camargo Pinto Ferraz Fabbri. 2018. Gamification in Software Testing: A Characterization Study. In Proceedings of the III Brazilian Symposium on Systematic and Automated Software Testing (SAO CARLOS, Brazil) (SAST ’18). Association for Computing Machinery, 39–48. https://doi.org/10.1145/3266003.3266007
[3]
Gordon Fraser, Alessio Gambi, Marvin Kreis, and José Miguel Rojas. 2019. Gamifying a Software Testing Course with Code Defenders. In Proc. of the ACM Technical Symposium on Computer Science Education (SIGCSE)(SIGCSE’19). ACM.
[4]
Steven Freeman, Tim Mackinnon, Nat Pryce, and Joe Walnes. 2004. Mock Roles, Not Objects.236–246. https://doi.org/10.1145/1028664.1028765
[5]
Yujian Fu and Peter J Clarke. 2016. Gamification-based cyber-enabled learning environment of software testing. In 2016 ASEE Annual Conference & Exposition.
[6]
Jerome Goddard. 2023. Hallucinations in ChatGPT: A Cautionary Tale for Biomedical Researchers. The American Journal of Medicine (2023).
[7]
Sajed Jalil, Suzzana Rafi, Thomas D. LaToza, Kevin Moran, and Wing Lam. 2023. ChatGPT and Software Testing Education: Promises & Perils. In 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 4130–4137. https://doi.org/10.1109/ICSTW58534.2023.00078
[8]
Benjamin Clegg José Miguel Rojas, Thomas White and Gordon Fraser. 2017. Code Defenders: Crowdsourcing Effective Tests and Subtle Mutants with a Mutation Testing Game. In Proc. of the International Conference on Software Engineering (ICSE) 2017. IEEE, 677–688.
[9]
T. Li, W. Zong, Y. Wang, H. Tian, Y. Wang, S. Cheung, and J. Kramer. 2023. Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE Computer Society, 14–26. https://doi.org/10.1109/ASE56229.2023.00089
[10]
OpenAI. 2022. Introducing ChatGPT. https://openai.com/blog/chatgpt.
[11]
Mike Papadakis, Marinos Kintis, Jie Zhang, Yue Jia, Yves Le Traon, and Mark Harman. 2019. Chapter Six - Mutation Testing Advances: An Analysis and Survey. Advances in Computers, Vol. 112. Elsevier, 275–378. https://doi.org/10.1016/bs.adcom.2018.03.015
[12]
Md. Mostafizer Rahman and Yutaka Watanobe. 2023. ChatGPT for Education and Research: Opportunities, Threats, and Strategies. Applied Sciences 13, 9 (2023). https://doi.org/10.3390/app13095783
[13]
José Miguel Rojas and Gordon Fraser. 2016. Code Defenders: A Mutation Testing Game. In Proc. of The 11th International Workshop on Mutation Analysis. IEEE, 162–167.
[14]
Max Schäfer, Sarah Nadi, Aryaz Eghbali, and Frank Tip. 2024. An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation. IEEE Transactions on Software Engineering 50, 1 (2024), 85–105. https://doi.org/10.1109/TSE.2023.3334955
[15]
Sandro Speth, Niklas Meißner, and Steffen Becker. 2023. Investigating the Use of AI-Generated Exercises for Beginner and Intermediate Programming Courses: A ChatGPT Case Study. In 2023 IEEE 35th International Conference on Software Engineering Education and Training (CSEE&T). 142–146. https://doi.org/10.1109/CSEET58097.2023.00030
[16]
Philipp Straubinger and Gordon Fraser. 2024. Improving Testing Behavior by Gamifying IntelliJ. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, Article 49, 13 pages. https://doi.org/10.1145/3597503.3623339
[17]
Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, Article 332, 7 pages. https://doi.org/10.1145/3491101.3519665
[18]
Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt. 2023. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arxiv:2302.11382

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering
June 2024
728 pages
ISBN:9798400717017
DOI:10.1145/3661167
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2024

Check for updates

Author Tags

  1. ChatGPT
  2. Computer Science Education
  3. Generative AI
  4. Smart Learning Assistant

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

EASE 2024

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 657
    Total Downloads
  • Downloads (Last 12 months)657
  • Downloads (Last 6 weeks)96
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media