ABSTRACT
A fundamental question in software testing research is how to compare test suites, often as a means for comparing test-generation techniques. Researchers frequently compare test suites by measuring their coverage. A coverage criterion C provides a set of test requirements and measures how many requirements a given suite satisfies. A suite that satisfies 100% of the (feasible) requirements is C-adequate.
Previous rigorous evaluations of coverage criteria mostly focused on such adequate test suites: given criteria C and C′, are C-adequate suites (on average) more effective than C′-adequate suites? However, in many realistic cases producing adequate suites is impractical or even impossible. We present the first extensive study that evaluates coverage criteria for the common case of non-adequate test suites: given criteria C and C′, which one is better to use to compare test suites? Namely, if suites T1, T2 . . . Tn have coverage values c1, c2 . . . cn for C and c′1, c′2 . . . c′n for C′, is it better to compare suites based on c1, c2 . . . cn or based on c′1, c′ 2 . . . c′n?
We evaluate a large set of plausible criteria, including statement and branch coverage, as well as stronger criteria used in recent studies. Two criteria perform best: branch coverage and an intra-procedural acyclic path coverage.
- M. Adolfsen. Industrial validation of test coverage quality. Master’s thesis, University of Twente, 2011.Google Scholar
- P. Ammann and J. Offutt. Introduction to Software Testing. Cambridge University Press, 2008. Google ScholarDigital Library
- J. H. Andrews, L. C. Briand, and Y. Labiche. Is mutation an appropriate tool for testing experiments? In International Conference on Software Engineering, pages 402–411, 2005. Google ScholarDigital Library
- J. H. Andrews, L. C. Briand, Y. Labiche, and A. S. Namin. Using mutation analysis for assessing and comparing testing coverage criteria. Trans. Softw. Eng., 32:608–624, 2006. Google ScholarDigital Library
- A. Arcuri and L. C. Briand. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In International Conference on Software Engineering, pages 1–10, 2011. Google ScholarDigital Library
- T. Ball. A theory of predicate-complete test coverage and generation. Technical Report MSR-TR-2004-28, Microsoft Research, 2004.Google Scholar
- T. Ball. A theory of predicate-complete test coverage and generation. In Formal Methods for Components and Objects, pages 1–22. 2005. Google ScholarDigital Library
- T. Ball and J. R. Larus. Efficient path profiling. In International Symposium on Microarchitecture, pages 46–57, 1996.Google ScholarCross Ref
- T. Ball and S. Rajamani. Automatically validating temporal safety properties of interfaces. In Workshop on Model Checking of Software, pages 103–122, 2001. Google ScholarDigital Library
- X. Cai and M. R. Lyu. The effect of code coverage on fault detection under different testing profiles. In International Workshop on Advances in Model-Based Testing, pages 1–7, 2005. Google ScholarDigital Library
- S. Chaki, E. M. Clarke, A. Groce, and O. Strichman. Predicate abstraction with minimum predicates. In Correct Hardware Design and Verification Methods, pages 19–34, 2003.Google ScholarCross Ref
- S. Chaki, A. Groce, and O. Strichman. Explaining abstract counterexamples. In Symposium on the Foundations of Software Engineering, pages 73–82, 2004. Google ScholarDigital Library
- T. M. Chilimbi, B. Liblit, K. Mehra, A. V. Nori, and K. Vaswani. Holmes: Effective statistical debugging via efficient path profiling. In International Conference on Software Engineering, pages 34–44, 2009. Google ScholarDigital Library
- N. Cliff. Ordinal Methods for Behavioral Data Analysis. Pyschology Press, 1996.Google Scholar
- Count lines of code. http://cloc.sourceforge.net/.Google Scholar
- T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms, Third Edition. The MIT Press, 2009. Google ScholarDigital Library
- H. L. Costner. Criteria for measures of association. American Sociological Review, 3, 1965.Google Scholar
- Instrumented container classes - predicate coverage. http://mir.cs.illinois.edu/coverage/.Google Scholar
- R. A. DeMillo, R. J. Lipton, and F. G. Sayward. Hints on test data selection: Help for the practicing programmer. Computer, 11:34–41, 1978. Google ScholarDigital Library
- H. Do, S. G. Elbaum, and G. Rothermel. Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empirical Softw. Engg., 10:405–435, 2005. Google ScholarDigital Library
- P. G. Frankl and O. Iakounenko. Further empirical studies of test effectiveness. In Symposium on the Foundations of Software Engineering, pages 153–162, 1998. Google ScholarDigital Library
- P. G. Frankl and S. N. Weiss. An experimental comparison of the effectiveness of branch testing and data flow testing. Trans. Software Eng., 19:774–787, 1993. Google ScholarDigital Library
- J. P. Galeotti, N. Rosner, C. G. López Pombo, and M. F. Frias. Analysis of invariants for efficient bounded verification. In International Symposium on Software Testing and Analysis, pages 25–36, 2010. Google ScholarDigital Library
- P. Godefroid. Compositional dynamic test generation. In Symposium on Principles of Programming Languages, pages 47–54, 2007. Google ScholarDigital Library
- A. Groce. (Quickly) testing the tester via path coverage. In Workshop on Dynamic Analysis, pages 22–28, 2009. Google ScholarDigital Library
- A. Groce. Coverage rewarded: Test input generation via adaptation-based programming. In International Conference on Automated Software Engineering, pages 380–383, 2011. Google ScholarDigital Library
- A. Groce, A. Fern, J. Pinto, T. Bauer, M. A. Alipour, M. Erwig, and C. Lopez. Lightweight automated testing with adaptation-based programming. In International Symposium on Software Reliability Engineering, pages 161–170, 2012.Google ScholarDigital Library
- A. Groce, G. Holzmann, and R. Joshi. Randomized differential testing as a prelude to formal verification. In International Conference on Software Engineering, pages 621–631, 2007. Google ScholarDigital Library
- A. Groce, C. Zhang, E. Eide, Y. Chen, and J. Regehr. Swarm testing. In International Symposium on Software Testing and Analysis, pages 78–88, 2012. Google ScholarDigital Library
- J. P. Guilford. Fundamental Statistics in Pyschology and Education. McGraw-Hill, 1956.Google Scholar
- A. Gupta and P. Jalote. An approach for experimentally evaluating effectiveness and efficiency of coverage criteria for software testing. Softw. Tools Technol. Transf., 10:145–160, 2008. Google ScholarDigital Library
- R. G. Hamlet. Testing programs with the aid of a compiler. Trans. Softw. Eng., 3:279–290, 1977. Google ScholarDigital Library
- M. Harder, J. Mellen, and M. D. Ernst. Improving test suites via operational abstraction. In International Conference on Software Engineering, pages 60–71, 2003. Google ScholarDigital Library
- M. M. Hassan and J. H. Andrews. Comparing multi-point stride coverage and dataflow coverage. In International Conference on Software Engineering, pages 172–181, 2013. Google ScholarDigital Library
- T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Lazy abstraction. In Symposium on Principles of Programming Languages, pages 58–70, 2002. Google ScholarDigital Library
- M. Hutchins, H. Foster, T. Goradia, and T. Ostrand. Experiments of the effectiveness of dataflow- and controlflow-based test adequacy criteria. In International Conference on Software Engineering, pages 191–200, 1994. Google ScholarDigital Library
- JFreeChart Home Page. http://www.jfree.org/ jfreechart/.Google Scholar
- Y. Jia and M. Harman. An analysis and survey of the development of mutation testing. Trans. Soft. Eng., 37:649–678, 2011. Google ScholarDigital Library
- JodaTime Home Page. http://joda-time. sourceforge.net/.Google Scholar
- M. Kendall. A new measure of rank correlation. Biometrika, 1-2:81–89, 1938.Google ScholarCross Ref
- J. R. Larus. Whole program paths. In Programming Language Design and Implementation, pages 259–269, 1999. Google ScholarDigital Library
- A. S. Namin and J. H. Andrews. The influence of size and coverage on test suite effectiveness. In International Symposium on Software Testing and Analysis, pages 57–68, 2009. Google ScholarDigital Library
- A. J. Offutt, G. Rothermel, and C. Zapf. An experimental evaluation of selective mutation. In International Conference on Software Engineering, pages 100–107, 1993. Google ScholarDigital Library
- C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball. Feedback-directed random test generation. In International Conference on Software Engineering, pages 75–84, 2007. Google ScholarDigital Library
- G. Rothermel, R. Untch, C. Chu, and M. J. Harrold. Test case prioritization. Trans. Softw. Eng., 27:929–948, 2001. Google ScholarDigital Library
- D. Schuler and A. Zeller. Javalanche: efficient mutation testing for Java. In Symposium on the Foundations of Software Engineering, pages 297–298, 2009. Google ScholarDigital Library
- R. Sharma, M. Gligoric, A. Arcuri, G. Fraser, and D. Marinov. Testing container classes: Random or systematic? In Fundamental Approaches to Software Engineering, pages 262–277, 2011. Google ScholarDigital Library
- R. Sharma, M. Gligoric, V. Jagannath, and D. Marinov. A comparison of constraint-based and sequence-based generation of complex input data structures. In Software Testing, Verification, and Validation Workshops, pages 337–342, 2010. Google ScholarDigital Library
- A. Siami Namin, J. H. Andrews, and D. J. Murdoch. Sufficient mutation operators for measuring test effectiveness. In International Conference on Software Engineering, pages 351–360, 2008. Google ScholarDigital Library
- SQLite Home Page. http://www.sqlite.org/.Google Scholar
- W. Visser, C. S. Pasareanu, and R. Pelánek. Test input generation for Java containers using state matching. In International Symposium on Software Testing and Analysis, pages 37–48, 2006. Google ScholarDigital Library
- M. Vittek, P. Borovansky, and P.-E. Moreau. A simple generic library for C. In International Conference on Software Reuse, pages 423–426, 2006. Google ScholarDigital Library
- F. I. Vokolos and P. G. Frankl. Empirical evaluation of the textual differencing regression testing technique. In International Conference on Software Maintenance, pages 44–53, 1998. Google ScholarDigital Library
- T. Wang and A. Roychoudhury. Automated path generation for software fault localization. In International Conference on Automated Software Engineering, pages 347–351, 2005. Google ScholarDigital Library
- W. Wong, J. Horgan, S. London, and A. Mathur. Effect of test set size and block coverage on the fault detection effectiveness. In International Symposium on Software Reliability, pages 230–238, 1994.Google ScholarCross Ref
- W. Wong, J. Horgan, S. London, and A. Mathur. Effect of test set minimization on fault detection effectiveness. In International Conference on Software Engineering, pages 41–50, 1995. Google ScholarDigital Library
- YAFFS: A flash file system for embedded use. http:// www.yaffs.net.Google Scholar
- L. Zhang, S.-S. Hou, J.-J. Hu, T. Xie, and H. Mei. Is operator-based mutant selection superior to random mutant selection? In International Conference on Software Engineering, pages 435–444, 2010. Google ScholarDigital Library
Index Terms
- Comparing non-adequate test suites using coverage criteria
Recommendations
Guidelines for Coverage-Based Comparisons of Non-Adequate Test Suites
Special Issue on ISSTA 2013A fundamental question in software testing research is how to compare test suites, often as a means for comparing test-generation techniques that produce those test suites. Researchers frequently compare test suites by measuring their coverage. A ...
Using coverage criteria on RepOK to reduce bounded-exhaustive test suites
TAP'12: Proceedings of the 6th international conference on Tests and ProofsBounded-exhaustive exploration of test case candidates is a commonly employed approach for test generation in some contexts. Even when small bounds are used for test generation, executing the obtained tests may become prohibitive, despite the time for ...
Using coverage to automate and improve test purpose based testing
Test purposes have been presented as a solution to avoid the state space explosion when selecting test cases from formal models. Although such techniques work very well with regard to the speed of the test derivation, they leave the tester with one ...
Comments