Abstract
The results of empirical studies in Software Engineering are limited to particular contexts, difficult to generalise and the studies themselves are expensive to perform. Despite these problems, empirical studies can be made effective and they are important to both researchers and practitioners. The key to their effectiveness lies in the maximisation of the information that can be gained by examining and replicating existing studies and using power analyses for an accurate minimum sample size. This approach was applied in a controlled experiment examining the combination of automated static analysis tools and code inspection in the context of the verification and validation (V&V) of concurrent Java components. The paper presents the results of this controlled experiment and shows that the combination of automated static analysis and code inspection is cost-effective. Throughout the experiment a strategy to maximise the information gained from the experiment was used. As a result, despite the size of the study, conclusive results were obtained, contributing to the research on V&V technology evaluation.
Similar content being viewed by others
References
Andrews G (1991) Concurrent programming: principles and practice. Addison-Wesley, Reading
Artho C (2001) Finding faults in multi-threaded programs. Masters thesis: Federal Institute of Technology, Zurich–Austin
Basili VR, Selby RW (1987) Comparing the effectiveness of software testing strategies. IEEE Trans Softw Eng 13:1278–1296
Basili VR, Selby RW, Hutchens DH (1986) Experimentation in software engineering. IEEE Trans Softw Eng 12:733–743
Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25:456–473
Brat G, Drusinsky D, Giannakopoulou D, Goldberg A, Havelund K, Lowry M, Pasareanu C, Venet A, Visser W, Washington R (2004) Experimental evaluation of verification and validation tools on Martian Rover software. Form Methods Syst Des 25:167–198
Briand LC, Penta MD, Labiche Y (2004) Assessing and improving state-based class testing: a series of experiments. IEEE Trans Softw Eng 30:770–793
Carver RH, Tai K-C (1998) Use of sequencing constraints for specification-based testing of concurrent programs. IEEE Trans Softw Eng 24:471–490
Carver J, Voorhis JV, Basili V (2004) Understanding the impact of assumptions on experimental validity. In: Proceedings of the 2004 International Symposium on Empirical Software Engineering, Redondo Beach, 19–20 August 2004, pp 251–260
Coakes SJ, Steed LG (2001) SPSS: analysis without anguish (version 10.0 for Windows). Wiley, New York
Cohen J (1988) Statistical power analysis for the behavioural sciences. Lawrence Erlbaum, Hillsdale
Corbett JC (1996) Evaluating deadlock detection methods for concurrent software. IEEE Trans Softw Eng 22:161–180
Do H, Elbaum S, Rothermel G (2005) Supporting controlled experimentation with testing techniques: an infrastructure and its potential impact. Empir Softw Eng 10:405–435
Dyba T, Kampenes VB, Sjøberg DIK (2006) A systematic review of statistical power in software engineering experiments. J Infor Softw Technol 48:745–755
Endres A, Rombach D (2003) A handbook of software and systems engineering. Addison-Wesley, Reading
Eytani Y, Havelund K, Stoller SD, Ur S (2007) Toward a framework and benchmark for testing tools for multi-threaded programs. Concurr Comput Pract Exp 19:267–279
Flanagan C, Freund SN (2004) Atomizer: a dynamic atomicity checker for multithreaded programs. In: Proceedings of the 31st ACM SIGPLAN-SIGACT. ACM, New York, pp 256–267
Frankl PG, Weiss SN (1993) An experimental comparison of the effectiveness of branch testing and data flow testing. IEEE Trans Softw Eng 19:774–787
Girgis MR, Woodward MR (1986) An Experimental Comparision of the Error Exposing Ability of Program Testing Criteria. In: Proceedings of the Workshop on Software Testing. IEEE, Los Alamitos, pp 64–73
Hallal HH, Alikacem E, Tunney WP, Boroday S, Petrenko A (2004) Antipattern-based detection of deficiencies in Java multithreaded software. In: Proceedings of the 4th International Conference on Quality Software (QSIC), Braunschweig, 8–10 September 2004, 258–267
Havelund K, Pressburger T (2000) Model checking Java programs using Java PathFinder. Int J Softw Tools Technol Transf 2:366–381
Hetzel WC (1976) An experimental analysis of program verification methods. Ph.D. thesis, University of North Carolina
Hovemeyer D, Pugh W (2004) Finding Concurrency bugs in Java. In: Proceedings of the 23rd Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC 2004) Workshop on Concurrency and Programs. http://www.cs.umd.edu/~daveho/research/csjp2004.pdf. Cited 12 April 2005
Howell DC (1997) Statistical methods for psychology. Wadsworth, Belmont
Jedlitschka A, Pfahl D (2005) Reporting guidelines for controlled experiments in software engineering. In: Proceedings of the 2005 International Symposium on Empirical Software Engineering (ISESE’05), Noosa Heads, 17–18 November 2005, pp 95–104
Jeffery R, Scott L (2002) Has twenty-five years of empirical software engineering made a difference? In: Proceedings of the Ninth Asia-Pacific Software Engineering Conference (APSEC’02). IEEE Computer Society, Washington, DC, pp 539–546
Juristo N, Moreno AM (2001) Basics of software engineering experimentation. Kluwer, Norwell
Juristo N, Moreno AM, Vegas S (2002) A survey on testing technique empirical studies: how limited is our knowledge. In: Proceedings of the 2002 International Symposium on Empirical Software Engineering (ISESE’02). IEEE Computer Society, Washington, DC, 161–172
Kamsties E, Lott CM (1995) An empirical evaluation of three defect-detection techniques. In: Proceedings of the Fifth European Software Engineering Conference, Barcelona, September 1995, pp 1–22
Kim S, Clark JA, McDermid JA (2000) Class mutation: mutation testing for object-oriented programs. Proc. Net.ObjectDays, Erfurt, Germany, http://www-users.cs.york.ac.uk/~jac/papers/ClassMutation.pdf. Cited 5 May 2005.
Kim S, Clark JA, McDermid JA (2001) Investigating the effectiveness of object-oriented testing strategies with the mutation method. Softw Test Verif Reliab 11:207–225
Kitchenham BA (2001) The case against software benchmarking, keynote lecture. In: Proceedings of The European Software Measurement Conference (FESMA-DASMA 2001), Heidelberg, May 2001, pp 1–9
Kitchenham BA (2004) Procedures for performing systematic reviews. Keele University TR/SE-0401, July 2004
Kitchenham BA, Pfleeger SL, Fenton N (1995) Towards a framework for software measurement validation. IEEE Trans Softw Eng 21:929–944
Kitchenham BA, Pfleeger SL, Pickard L, Jones P, Hoaglin D, Emam KE, Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. IEEE Trans Softw Eng 28:721–734
Kitchenham BA, Linkman SG, Fry JS (2003) Experimenter induced distortions in empirical software engineering. In: Proceedings of 2nd International Workshop on Empirical Software Engineering (WSESE), Roman Castles, September 2003, pp 7–15
Lea D (2005) Overview of package util.concurrent Release 1.3.4. http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html. Cited 12 April 2005
Lindsay RM, Ehrendberg ASC (1993) The design of replicated studies. Am Stat 47:217–228
Long B, Strooper P, Hoffman D (2003) Tool support for testing concurrent Java components. IEEE Trans Softw Eng 29:555–566
Long B, Duke R, Goldson D, Strooper P, Wildman L (2004) Mutation-based evaluation of a method for verifying concurrent Java components. In: Proceedings of the 2nd International Workshop on Parallel and Distributed Systems: Testing and Debugging (PADTAD), Santa Fe, 30 April 2004, p 265b
Long B, Strooper P, Wildman L (2007) A method for verifying concurrent Java components. Concurr Comput Pract Exp 19:281–294
Lott CM (2005) Comparing reading and testing techniques. http://www.chris-lott.org/work/exp/. Cited 12 August 2005
Lott CM, Rombach HD (1996) Repeatable software engineering experiments for comparing defect-detection techniques. Empir Softw Eng 1:241–277
Magee J, Kramer J (1999) Concurrency: state models & Java programs. Wiley, New York
Miller I, Freund JE (1965) Probability and statistics for engineers. Prentice-Hall, Englewood Cliffs
Myers G (1978) A controlled experiment in program testing and code walkthroughs/inspections. Commun ACM 21:760–768
Ngui J, Strooper P, Wildman L, Wojcicki M (2007) Comparing the cost-effectiveness of statically analysing and model checking concurrent Java components for deadlocks. In: Australian Software Engineering Conference (ASWEC ’07), Melbourne, 10–13 April 2007, pp 223–232
Novillo E, Lu P (2003) A case study of selected SPLASH-2 applications and the SBT debugging tool. In: Proceedings of the 1st International Workshop on Parallel and Distributed Systems: Testing and Debugging (PADTAD). IEEE Computer Society, Washington, DC, p 290b
Parnas DL (2003) The limits of empirical studies of software engineering. In: Proceedings of the 2003 International Symposium on Empirical Software Engineering (ISESE’03). IEEE Computer Society, Washington, DC, pp 2–5
Pickard LM, Kitchenham BA, Jones P (1998) Combining empirical results in software engineering. Inf Softw Technol 40:811–821
Reid SC (1997) Module testing techniques—which are the most effective? In: Proceedings of Eurostar97: The Fifth European Conference on Software Testing, November, 1997
Roper M, Miller J, Brooks A, Wood M (1994) Towards the experimental evaluation of software testing techniques. In: Proceedings of EuroSTAR’94, October 1994, pp 1–10
Russel GW (1991) Experience with inspections in ultralarge-scale development. IEEE Softw 8:25–31
Selby RW (1986) Combining software testing strategies: an empirical evaluation. In: Proceedings of the ACM/SIGSOFT IEEE Workshop on Software Testing, Banff, July 1986, pp 82–90
Shull F, Basili V, Carver J, Maldonado JC, Travassos GH, Mendonca M, Fabbri S (2002) Replicating software engineering experiments: addressing the tacit knowledge problem. In: Proceedings of the 2002 International Symposium on Empirical Software Engineering (ISESE’02). IEEE Computer Society, Washington, DC, 7–16
Shull F, Mendoncça MG, Basili V, Carver J, Maldonado JC, Fabbri S, Travassos GH, Ferreira MC (2004) Knowledge-sharing issues in experimental software engineering. Empir Softw Eng 9:111–137
Sjoberg DIK, Hannay JE, Hansen O, Kampenes VB, Karahasanovic A, Liborg N-K, Rekdal AC (2005) A survey of controlled experiments in software engineering. IEEE Trans Softw Eng 31:733–753
So SS, Cha SD, Shimeall TJ, Kwon YR (2002) An empirical evaluation of six methods to detect faults in software. Softw Test Verif Reliab 12:155–171
Szyperski C (1998) Component software: beyond object-oriented programming. Addison-Wesley, Reading
Tichy WF (1998) Should computer scientists experiment more? Computer 31:32–40
Wohlin C, Runeson P, Host M, Ohlsson M, Regnell B, Wesslen A (2000) Experimentation in software engineering. Kluwer, Norwell
Wojcicki M (2006) Evaluating verification and validation technologies for concurrent components (unpublished report). University of Queensland. http://www.itee.uq.edu.au/~wojcicki. Cited 3 October 2006
Wojcicki M, Strooper P (2006) A state-of-practice questionnaire on verification and validation for concurrent programs. Proceedings of the 4th International Workshop on Parallel and Distributed Systems: Testing and Debugging (PADTAD), Portland, 17 July 2006, pp 1–10
Wood M, Roper M, Brooks A, Miller J (1997) Comparing and combining software defect detection techniques: a replicated empirical study. In: von Schauer H, Jazayeri M (eds) Proceedings of the 6th European Software Engineering Conference. Springer, Berlin Heidelberg New York, pp 262–277
Zar JH (1972) Significance testing of the Spearman rank correlation coefficient. J Am Stat Assoc 67:578–580
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: José Carlo Maldonado
Rights and permissions
About this article
Cite this article
Wojcicki, M.A., Strooper, P. Maximising the information gained from a study of static analysis technologies for concurrent software. Empir Software Eng 12, 617–645 (2007). https://doi.org/10.1007/s10664-007-9044-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-007-9044-6