research-article

Open access

An Empirical Study on How Large Language Models Impact Software Testing Learning

Authors:

Simone Mezzaro,

Alessio Gambi,

Gordon FraserAuthors Info & Claims

EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering

Pages 555 - 564

https://doi.org/10.1145/3661167.3661273

Published: 18 June 2024 Publication History

All formats PDF

Abstract

Software testing is a challenging topic in software engineering education and requires creative approaches to engage learners. For example, the Code Defenders game has students compete over a Java class under test by writing effective tests and mutants. While such gamified approaches deal with problems of motivation and engagement, students may nevertheless require help to put testing concepts into practice. The recent widespread diffusion of Generative AI and Large Language Models raises the question of whether and how these disruptive technologies could address this problem, for example, by providing explanations of unclear topics and guidance for writing tests. However, such technologies might also be misused or produce inaccurate answers, which would negatively impact learning. To shed more light on this situation, we conducted the first empirical study investigating how students learn and practice new software testing concepts in the context of the Code Defenders testing game, supported by a smart assistant based on a widely known, commercial Large Language Model. Our study shows that students had unrealistic expectations about the smart assistant, “blindly” trusting any output it generated, and often trying to use it to obtain solutions for testing exercises directly. Consequently, students who resorted to the smart assistant more often were less effective and efficient than those who did not. For instance, they wrote 8.6% fewer tests, and their tests were not useful in 78.0% of the cases. We conclude that giving unrestricted and unguided access to Large Language Models might generally impair learning. Thus, we believe our study helps to raise awareness about the implications of using Generative AI and Large Language Models in Computer Science Education and provides guidance towards developing better and smarter learning tools.

References

[1]

Marian Daun and Jennifer Brings. 2023. How ChatGPT Will Change Software Engineering Education. Association for Computing Machinery.

Abstract

References

Index Terms

Recommendations

Using Large Language Models to Generate JUnit Tests: An Empirical Study

Assessing model-based testing: an empirical study conducted in industry

To Kill a Mutant: An Empirical Study of Mutation Testing Kills

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

HTML Format

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations