skip to main content
10.1145/3106237.3106288acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Measuring the cost of regression testing in practice: a study of Java projects using continuous integration

Published:21 August 2017Publication History

ABSTRACT

Software defects cost time and money to diagnose and fix. Consequently, developers use a variety of techniques to avoid introducing defects into their systems. However, these techniques have costs of their own; the benefit of using a technique must outweigh the cost of applying it.

In this paper we investigate the costs and benefits of automated regression testing in practice. Specifically, we studied 61 projects that use Travis CI, a cloud-based continuous integration tool, in order to examine real test failures that were encountered by the developers of those projects. We determined how the developers resolved the failures they encountered and used this information to classify the failures as being caused by a flaky test, by a bug in the system under test, or by a broken or obsolete test. We consider that test failures caused by bugs represent a benefit of the test suite, while failures caused by broken or obsolete tests represent a test suite maintenance cost.

We found that 18% of test suite executions fail and that 13% of these failures are flaky. Of the non-flaky failures, only 74% were caused by a bug in the system under test; the remaining 26% were due to incorrect or obsolete tests. In addition, we found that, in the failed builds, only 0.38% of the test case executions failed and 64% of failed builds contained more than one failed test.

Our findings contribute to a wider understanding of the unforeseen costs that can impact the overall cost effectiveness of regression testing in practice. They can also inform research into test case selection techniques, as we have provided an approximate empirical bound on the practical value that could be extracted from such techniques. This value appears to be large, as the 61 systems under study contained nearly 3 million lines of test code and yet over 99% of test case executions could have been eliminated with a perfect oracle.

References

  1. Moritz Beller, Georgios Gousios, Annibale Panichella, and Andy Zaidman. 2015. When, How, and Why Developers (Do Not) Test in Their IDEs. In Proceedings of the European Software Engineering Conference held jointly with the Symposium on Foundations of Software Engineering (ESEC/FSE). 179–190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sebastian Elbaum, Gregg Rothermel, and John Penix. 2014. Techniques for Improving Regression Testing in Continuous Integration Development Environments. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE). 235–245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Mark Grechanik, Qing Xie, and Chen Fu. 2009. Maintaining and Evolving GUIdirected Test Scripts. In Proceedings of the International Conference on Software Engineering (ICSE). 408–418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Kim Herzig, Michaela Greiler, Jacek Czerwonka, and Brendan Murphy. 2015. The Art of Testing Less Without Sacrificing Quality. In Proceedings of the International Conference on Software Engineering (ICSE). 483–493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Saeed Salem Jeff Anderson and Hyunsook Do. 2014. Improving the Effectiveness of Test Suite Through Mining Historical Data. In Proceedings of the Working Conference on Mining Software Repositories (MSR). 142–151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jussi Kasurinen, Ossi Taipale, and Kari Smolander. 2010. Software Test Automation in Practice: Empirical Observations. Advances in Software Engineering (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Negar Koochakzadeh and Vahid Garousi. 2010. A Tester-Assisted Methodology for Test Redundancy Detection. Advances in Software Engineering (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Negar Koochakzadeh, Vahid Garousi, and Frank Maurer. 2009. Test Redundancy Measurement Based on Coverage Information: Evaluations and Lessons Learned. In Proceedings of the International Conference on Software Testing Verification and Validation (ICST). 220–229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cosmin Marsavina, Daniela Romano, and Andy Zaidman. 2014. Studying Fine-Grained Co-Evolution Patterns of Production and Test Code. In Proceedings of the International Working Conference on Source Code Analysis and Manipulation (SCAM). 195–204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Atif M Memon and Mary Lou Soffa. 2003. Regression Testing of GUIs. SIGSOFT Software Engineering Notes 28, 5 (2003), 118–127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Leandro Sales Pinto, Saurabh Sinha, and Alessandro Orso. 2012. Understanding Myths and Realities of Test-Suite Evolution. In Proceedings of the Symposium on the Foundations of Software Engineering (FSE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. David Rosenblum and Elaine Weyuker. 1997. Using Coverage Information to Predict the Cost-Effectiveness of Regression Testing Strategies. Transactions on Software Engineering (TSE) 23:3 (1997), 146–156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gregg Rothermel, Sebastian Elbaum, Alexey Malishevsky, Praveen Kallakuri, and Brian Davia. 2002.Google ScholarGoogle Scholar
  14. The Impact of Test Suite Granularity on the Costeffectiveness of Regression Testing. In Proceedings of the International Conference on Software Engineering (ICSE). 130–140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Bogdan Vasilescu, Stef Van Schuylenburg, Jules Wulms, Alexander Serebrenik, and Mark GJ van den Brand. 2015. Continuous Integration in a Social-Coding World: Empirical Evidence From GitHub.** Updated version with corrections**. arXiv preprint arXiv:1512.01862 (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Measuring the cost of regression testing in practice: a study of Java projects using continuous integration

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering
        August 2017
        1073 pages
        ISBN:9781450351058
        DOI:10.1145/3106237

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 August 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate112of543submissions,21%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader