Skip to main content
Log in

Achieving scalable mutation-based generation of whole test suites

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Without complete formal specification, automatically generated software tests need to be manually checked in order to detect faults. This makes it desirable to produce the strongest possible test set while keeping the number of tests as small as possible. As commonly applied coverage criteria like branch coverage are potentially weak, mutation testing has been proposed as a stronger criterion. However, mutation based test generation is hampered because usually there are simply too many mutants, and too many of these are either trivially killed or equivalent. On such mutants, any effort spent on test generation would per definition be wasted. To overcome this problem, our search-based EvoSuite test generation tool integrates two novel optimizations: First, we avoid redundant test executions on mutants by monitoring state infection conditions, and second we use whole test suite generation to optimize test suites towards killing the highest number of mutants, rather than selecting individual mutants. These optimizations allowed us to apply EvoSuite to a random sample of 100 open source projects, consisting of a total of 8,963 classes and more than two million lines of code, leading to a total of 1,380,302 mutants. The experiment demonstrates that our approach scales well, making mutation testing a viable test criterion for automated test case generation tools, and allowing us to analyze the relationship of branch coverage and mutation testing in detail.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. We used the 1.02 version of SF100. The original version in Fraser and Arcuri (2012b) had 8, 784 classes, but more classes became available once we fixed some classpath issues (e.g., missing jars) in some of the projects.

  2. Note that the base case of branch coverage is produced using whole test suite generation; targeting individual branches would lead to lower branch coverage (Fraser and Arcuri 2013c).

  3. This has nothing to do with whether what proposed in Just et al. (2012) is valuable or not. In particular, it is important to stress out that, compared to the literature, the case study in Just et al. (2012) is among the largest and most variegated

References

  • Acree AT (1980) On mutation. Phd thesis, Georgia Institute of Technology, Atlanta, Georgia

  • Adamopoulos K, Harman M, Hierons RM (2004) How to overcome the equivalent mutant problem and achieve tailored selective mutation using co-evolution. In: Genetic and evolutionary computation conference (GECCO). Seattle, Washington, pp 1338–1349

  • Andrews JH, Briand LC, Labiche Y (2005) Is mutation an appropriate tool for testing experiments? In: Proceedings of the 27th international conference on software engineering, ICSE 05. ACM, pp 402–411

  • Arcuri A (2013) It really does matter how you normalize the branch distance in search-based software testing. Softw Test Verification Reliab (STVR) 23(2):119–147

    Article  Google Scholar 

  • Arcuri A, Briand L (2012) A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. In: Software testing verification and reliability (STVR). doi:10.1002/stvr.1486

  • Arcuri A, Fraser G (2013) Parameter tuning or default values? An empirical investigation in search-based software engineering. In: Empirical software engineering (EMSE). pp 1–30. doi:10.1007/s10664-013-9249-9

  • Ayari K, Bouktif S, Antoniol G (2007) Automatic mutation test input data generation via ant colony. In: Genetic and evolutionary computation conference (GECCO). ACM, New York, pp 1074–1081

  • Baker R, Habli I (2012) An empirical evaluation of mutation testing for improving the test quality of safety-critical software. In: IEEE transactions on software engineering (TSE)

  • Baldwin D, Sayward FG (1979) Heuristics for determining equivalence of program mutations. Technical Report 276, Yale University, New Haven, Connecticut

  • Baudry B, Fleurey F, Jzquel JM, Le Traon Y (2005) Automatic test cases optimization: a bacteriologic algorithm. IEEE Softw 22(2):76–82

    Article  Google Scholar 

  • Bauersfeld S, Vos T, Lakhotia K, Poulding S, Condori N (2013) Unit testing tool competition. In: International workshop on search-based software testing (SBST). pp 414–420

  • Bottaci L (2001) A genetic algorithm fitness function for mutation testing. In: International workshop on software engineering using metaheuristic inovative algorithms, a workshop at 23rd International conference on software engineering, SEMINAL 2001. pp 3–7

  • Budd TA (1980) Mutation analysis of program test data. Phd thesis, Yale University, New Haven, Connecticut

  • DeMillo RA, Offutt AJ (1991) Constraint-based automatic test data generation. IEEE Trans Softw Eng 17(9):900–910

    Article  Google Scholar 

  • DeMillo RA, Lipton RJ, Sayward FG (1978) Hints on test data selection: help for the practicing programmer. Computer 11(4):34–41

    Article  Google Scholar 

  • Deng L, Offutt J, Li N (2013) Empirical evaluation of the statement deletion mutation operator. In: IEEE International conference on software testing, verification and validation (ICST)

  • Fleyshgakker VN, Weiss SN (1994) Efficient mutation analysis: a new approach. In: Proceedings of the international symposium on software testing and analysis, ISSTA ’94. Seattle, Washington, pp 185–195

  • Frankl PG, Weiss SN, Hu C (1997) All-uses vs mutation testing: an experimental comparison of effectiveness. J Syst Softw 38(3):235–253

    Article  Google Scholar 

  • Fraser G, Arcuri A (2011a) EvoSuite: Automatic test suite generation for object-oriented software. In: ACM symposium on the foundations of software engineering (FSE). pp 416–419

  • Fraser G, Arcuri A (2011b) It is not the length that matters, it is how you control it. In: IEEE international conference on software testing, verification and validation (ICST). pp 150–159

  • Fraser G, Arcuri A (2012a) The seed is strong: Seeding strategies in search-based software testing. In: IEEE international conference on software testing, verification and validation (ICST). pp 121–130

  • Fraser G, Arcuri A (2012b) Sound empirical evidence in software testing. In: ACM/IEEE international conference on software engineering (ICSE). pp 178–188

  • Fraser G, Arcuri A (2013a) Evosuite at the SBST 2013 tool competition. In: International workshop on search-based software testing (SBST). pp 406–409

  • Fraser G, Arcuri A (2013b) EvoSuite: On the challenges of test case generation in the real world (tool paper). In: IEEE international conference on software testing, verification and validation (ICST)

  • Fraser G, Arcuri A (2013c) Whole test suite generation, vol 39

  • Fraser G, Zeller A (2012) Mutation-driven generation of unit tests and oracles. IEEE Trans Softw Eng (TSE) 28(2):278–292

    Article  Google Scholar 

  • Fraser G, Arcuri A, McMinn P (2013) Test suite generation with memetic algorithms. In: Genetic and evolutionary computation conference (GECCO)

  • Godefroid P, Klarlund N, Sen K (2005) DART: directed automated random testing. In: Proceedings of the 2005 ACM SIGPLAN conference on programming language design and implementation, PLDI05. ACM, pp 213223

  • Hamlet RG (1977) Testing programs with the aid of a compiler. IEEE Trans Softw Eng 3(4):279–290

    Article  MATH  MathSciNet  Google Scholar 

  • Harman M, Jia Y, Langdon WB (2011) Strong higher order mutation-based test data generation. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, ESEC/FSE 11. ACM, pp 212–222

  • Hierons RM, Harman M, Danicic S (1999) Using program slicing to assist in the detection of equivalent mutants. Softw Test Verification Reliab 9(4):233–262

    Article  Google Scholar 

  • Howden WE (1982) Weak mutation testing and completeness of test sets. IEEE Trans Softw Eng 8(4):371–379

    Article  Google Scholar 

  • Jia Y, Harman M (2009) Higher order mutation testing. J Informat Softw Technol 51(10):1379–1393

    Article  Google Scholar 

  • Jia Y, Harman M (2011) An analysis and survey of the development of mutation testing. IEEE Trans Softw Eng (TSE) 37(5):649–678

    Article  Google Scholar 

  • Just R, Kapfhammer GM, Schweiggert F (2012) Using non-redundant mutation operators and test suite prioritization to achieve efficient and scalable mutation analysis. In: Proceedings of the 2012 IEEE 23rd international symposium on software reliability engineering, ISSRE 12. IEEE Computer Society, pp11–20

  • Just R, Ernst MD, Fraser G (2013) Using state infection conditions to detect equivalent mutants and speed up mutation analysis. arXiv preprint arXiv:13032784

  • Korel B (1990) Automated software test data generation. In: IEEE Transactions on software engineering, pp 870–879

  • Mateo PR, Usaola MP, Aleman JLF (2012) Validating 2nd-order mutation at system level. In: IEEE Transactions on software engineering (TSE)

  • McMinn P (2004) Search-based software test data generation: a survey. Softw Test Verification Reliab 14(2):105–156

    Article  Google Scholar 

  • Offutt AJ (1992) Investigations of the software testing coupling effect. ACM Trans Softw Eng Methodol 1(1):5–20

    Article  Google Scholar 

  • Offutt AJ, Craft WM (1994) Using compiler optimization techniques to detect equivalent mutants. Softw Test Verification Reliab 4(3):131–154

    Article  Google Scholar 

  • Offutt AJ, Lee SD (1991) How strong is weak mutation? In: Proceedings of the symposium on testing, analysis, and verification. TAV4. ACM, pp 200–213

  • Offutt AJ, Lee SD (1994) An empirical evaluation of weak mutation. IEEE Trans Softw Eng 20(5):337–344

    Article  Google Scholar 

  • Offutt AJ, Pan J (1997) Automatically detecting equivalent mutants and infeasible paths. Softw Test Verification Reliab 7(3):165–192

    Article  Google Scholar 

  • Offutt AJ, Untch RH (2001) Mutation testing for the new century. Chap mutation 2000: uniting the orthogonal. Kluwer Academic Publishers, Norwell, MA, pp 34–44

  • Offutt AJ, Rothermel G, Zapf C (1993) An experimental evaluation of selective mutation. In: Proceedings of the 15th international conference on software engineering, ICSE ’93. Baltimore, Maryland, pp 100–107

  • Offutt AJ, Ma YS, Kwon YR (2004) An experimental mutation system for Java. ACM SIGSOFT Softw Eng Notes 29(5):1–4

    Google Scholar 

  • Pacheco C, Ernst MD (2007) Randoop: feedback-directed random testing for Java. In: Companion to the 22nd ACM SIGPLAN conference on object-oriented programming systems and application, OOPSLA07. ACM, pp 815816

  • Papadakis M, Malevris N (2010) Automatic mutation test case generation via dynamic symbolic execution. In: IEEE 21st International symposium on software reliability engineering (ISSRE). pp 121–130

  • Patrick M, Alexander R, Oriol M, Clark JA (2013) Using mutation analysis to evolve subdomains for random testing. In: International workshop on mutation analysis

  • Schuler D, Zeller A (2010) (Un-)Covering equivalent mutants. In: Proceedings of the 3rd international conference on software testing, verification, and validation, ICST ’10. IEEE Computer Society, pp 45–54

  • Staats M, Whalen MW, Heimdahl MP (2011) Programs, tests, and oracles: the foundations of testing revisited. In: ACM/IEEE international conference on software engineering (ICSE). pp 391–400

  • Untch RH (1992) Mutation-based software testing using program schemata. In: Proceedings of the 30th annual southeast regional conference (ACM-SE’92). Raleigh, North Carolina, pp 285–291

  • Walsh PJ (1985) A measure of test case completeness (software, engineering). PhD thesis, State University of New York at Binghamton, Binghamton, NY

  • Wong WE, Mathur AP, Maldonado JC (1995) Mutation versus all-uses: an empirical evaluation of cost, strength and effectiveness. In: Software quality and productivity: theory, practice and training. Chapman & Hall, London, pp 258–265

  • Zhang L, Xie T, Zhang L, Tillmann N, de Halleux J, Mei H (2010) Test generation via dynamic symbolic execution for mutation testing. In: Proceedings of the 2010 IEEE international conference on software maintenance, ICSM 10. IEEE Computer Society, pp 110

Download references

Acknowledgments

This project has been funded by a Google Focused Research Award on “Test Amplification” and the Norwegian Research Council.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gordon Fraser.

Additional information

Communicated by: Antonia Bertolino

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fraser, G., Arcuri, A. Achieving scalable mutation-based generation of whole test suites. Empir Software Eng 20, 783–812 (2015). https://doi.org/10.1007/s10664-013-9299-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-013-9299-z

Keywords

Navigation