Skip to main content
Log in

Maximising the information gained from a study of static analysis technologies for concurrent software

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

The results of empirical studies in Software Engineering are limited to particular contexts, difficult to generalise and the studies themselves are expensive to perform. Despite these problems, empirical studies can be made effective and they are important to both researchers and practitioners. The key to their effectiveness lies in the maximisation of the information that can be gained by examining and replicating existing studies and using power analyses for an accurate minimum sample size. This approach was applied in a controlled experiment examining the combination of automated static analysis tools and code inspection in the context of the verification and validation (V&V) of concurrent Java components. The paper presents the results of this controlled experiment and shows that the combination of automated static analysis and code inspection is cost-effective. Throughout the experiment a strategy to maximise the information gained from the experiment was used. As a result, despite the size of the study, conclusive results were obtained, contributing to the research on V&V technology evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Andrews G (1991) Concurrent programming: principles and practice. Addison-Wesley, Reading

  • Artho C (2001) Finding faults in multi-threaded programs. Masters thesis: Federal Institute of Technology, Zurich–Austin

  • Basili VR, Selby RW (1987) Comparing the effectiveness of software testing strategies. IEEE Trans Softw Eng 13:1278–1296

    Article  Google Scholar 

  • Basili VR, Selby RW, Hutchens DH (1986) Experimentation in software engineering. IEEE Trans Softw Eng 12:733–743

    Google Scholar 

  • Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25:456–473

    Article  Google Scholar 

  • Brat G, Drusinsky D, Giannakopoulou D, Goldberg A, Havelund K, Lowry M, Pasareanu C, Venet A, Visser W, Washington R (2004) Experimental evaluation of verification and validation tools on Martian Rover software. Form Methods Syst Des 25:167–198

    Article  MATH  Google Scholar 

  • Briand LC, Penta MD, Labiche Y (2004) Assessing and improving state-based class testing: a series of experiments. IEEE Trans Softw Eng 30:770–793

    Article  Google Scholar 

  • Carver RH, Tai K-C (1998) Use of sequencing constraints for specification-based testing of concurrent programs. IEEE Trans Softw Eng 24:471–490

    Article  Google Scholar 

  • Carver J, Voorhis JV, Basili V (2004) Understanding the impact of assumptions on experimental validity. In: Proceedings of the 2004 International Symposium on Empirical Software Engineering, Redondo Beach, 19–20 August 2004, pp 251–260

  • Coakes SJ, Steed LG (2001) SPSS: analysis without anguish (version 10.0 for Windows). Wiley, New York

  • Cohen J (1988) Statistical power analysis for the behavioural sciences. Lawrence Erlbaum, Hillsdale

    Google Scholar 

  • Corbett JC (1996) Evaluating deadlock detection methods for concurrent software. IEEE Trans Softw Eng 22:161–180

    Article  Google Scholar 

  • Do H, Elbaum S, Rothermel G (2005) Supporting controlled experimentation with testing techniques: an infrastructure and its potential impact. Empir Softw Eng 10:405–435

    Article  Google Scholar 

  • Dyba T, Kampenes VB, Sjøberg DIK (2006) A systematic review of statistical power in software engineering experiments. J Infor Softw Technol 48:745–755

    Article  Google Scholar 

  • Endres A, Rombach D (2003) A handbook of software and systems engineering. Addison-Wesley, Reading

  • Eytani Y, Havelund K, Stoller SD, Ur S (2007) Toward a framework and benchmark for testing tools for multi-threaded programs. Concurr Comput Pract Exp 19:267–279

    Article  Google Scholar 

  • Flanagan C, Freund SN (2004) Atomizer: a dynamic atomicity checker for multithreaded programs. In: Proceedings of the 31st ACM SIGPLAN-SIGACT. ACM, New York, pp 256–267

  • Frankl PG, Weiss SN (1993) An experimental comparison of the effectiveness of branch testing and data flow testing. IEEE Trans Softw Eng 19:774–787

    Article  Google Scholar 

  • Girgis MR, Woodward MR (1986) An Experimental Comparision of the Error Exposing Ability of Program Testing Criteria. In: Proceedings of the Workshop on Software Testing. IEEE, Los Alamitos, pp 64–73

  • Hallal HH, Alikacem E, Tunney WP, Boroday S, Petrenko A (2004) Antipattern-based detection of deficiencies in Java multithreaded software. In: Proceedings of the 4th International Conference on Quality Software (QSIC), Braunschweig, 8–10 September 2004, 258–267

  • Havelund K, Pressburger T (2000) Model checking Java programs using Java PathFinder. Int J Softw Tools Technol Transf 2:366–381

    Article  MATH  Google Scholar 

  • Hetzel WC (1976) An experimental analysis of program verification methods. Ph.D. thesis, University of North Carolina

  • Hovemeyer D, Pugh W (2004) Finding Concurrency bugs in Java. In: Proceedings of the 23rd Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC 2004) Workshop on Concurrency and Programs. http://www.cs.umd.edu/~daveho/research/csjp2004.pdf. Cited 12 April 2005

  • Howell DC (1997) Statistical methods for psychology. Wadsworth, Belmont

    Google Scholar 

  • Jedlitschka A, Pfahl D (2005) Reporting guidelines for controlled experiments in software engineering. In: Proceedings of the 2005 International Symposium on Empirical Software Engineering (ISESE’05), Noosa Heads, 17–18 November 2005, pp 95–104

  • Jeffery R, Scott L (2002) Has twenty-five years of empirical software engineering made a difference? In: Proceedings of the Ninth Asia-Pacific Software Engineering Conference (APSEC’02). IEEE Computer Society, Washington, DC, pp 539–546

  • Juristo N, Moreno AM (2001) Basics of software engineering experimentation. Kluwer, Norwell

    MATH  Google Scholar 

  • Juristo N, Moreno AM, Vegas S (2002) A survey on testing technique empirical studies: how limited is our knowledge. In: Proceedings of the 2002 International Symposium on Empirical Software Engineering (ISESE’02). IEEE Computer Society, Washington, DC, 161–172

  • Kamsties E, Lott CM (1995) An empirical evaluation of three defect-detection techniques. In: Proceedings of the Fifth European Software Engineering Conference, Barcelona, September 1995, pp 1–22

  • Kim S, Clark JA, McDermid JA (2000) Class mutation: mutation testing for object-oriented programs. Proc. Net.ObjectDays, Erfurt, Germany, http://www-users.cs.york.ac.uk/~jac/papers/ClassMutation.pdf. Cited 5 May 2005.

  • Kim S, Clark JA, McDermid JA (2001) Investigating the effectiveness of object-oriented testing strategies with the mutation method. Softw Test Verif Reliab 11:207–225

    Article  Google Scholar 

  • Kitchenham BA (2001) The case against software benchmarking, keynote lecture. In: Proceedings of The European Software Measurement Conference (FESMA-DASMA 2001), Heidelberg, May 2001, pp 1–9

  • Kitchenham BA (2004) Procedures for performing systematic reviews. Keele University TR/SE-0401, July 2004

  • Kitchenham BA, Pfleeger SL, Fenton N (1995) Towards a framework for software measurement validation. IEEE Trans Softw Eng 21:929–944

    Article  Google Scholar 

  • Kitchenham BA, Pfleeger SL, Pickard L, Jones P, Hoaglin D, Emam KE, Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. IEEE Trans Softw Eng 28:721–734

    Article  Google Scholar 

  • Kitchenham BA, Linkman SG, Fry JS (2003) Experimenter induced distortions in empirical software engineering. In: Proceedings of 2nd International Workshop on Empirical Software Engineering (WSESE), Roman Castles, September 2003, pp 7–15

  • Lea D (2005) Overview of package util.concurrent Release 1.3.4. http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html. Cited 12 April 2005

  • Lindsay RM, Ehrendberg ASC (1993) The design of replicated studies. Am Stat 47:217–228

    Article  Google Scholar 

  • Long B, Strooper P, Hoffman D (2003) Tool support for testing concurrent Java components. IEEE Trans Softw Eng 29:555–566

    Article  Google Scholar 

  • Long B, Duke R, Goldson D, Strooper P, Wildman L (2004) Mutation-based evaluation of a method for verifying concurrent Java components. In: Proceedings of the 2nd International Workshop on Parallel and Distributed Systems: Testing and Debugging (PADTAD), Santa Fe, 30 April 2004, p 265b

  • Long B, Strooper P, Wildman L (2007) A method for verifying concurrent Java components. Concurr Comput Pract Exp 19:281–294

    Article  Google Scholar 

  • Lott CM (2005) Comparing reading and testing techniques. http://www.chris-lott.org/work/exp/. Cited 12 August 2005

  • Lott CM, Rombach HD (1996) Repeatable software engineering experiments for comparing defect-detection techniques. Empir Softw Eng 1:241–277

    Article  Google Scholar 

  • Magee J, Kramer J (1999) Concurrency: state models & Java programs. Wiley, New York

    MATH  Google Scholar 

  • Miller I, Freund JE (1965) Probability and statistics for engineers. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  • Myers G (1978) A controlled experiment in program testing and code walkthroughs/inspections. Commun ACM 21:760–768

    Article  Google Scholar 

  • Ngui J, Strooper P, Wildman L, Wojcicki M (2007) Comparing the cost-effectiveness of statically analysing and model checking concurrent Java components for deadlocks. In: Australian Software Engineering Conference (ASWEC ’07), Melbourne, 10–13 April 2007, pp 223–232

  • Novillo E, Lu P (2003) A case study of selected SPLASH-2 applications and the SBT debugging tool. In: Proceedings of the 1st International Workshop on Parallel and Distributed Systems: Testing and Debugging (PADTAD). IEEE Computer Society, Washington, DC, p 290b

  • Parnas DL (2003) The limits of empirical studies of software engineering. In: Proceedings of the 2003 International Symposium on Empirical Software Engineering (ISESE’03). IEEE Computer Society, Washington, DC, pp 2–5

  • Pickard LM, Kitchenham BA, Jones P (1998) Combining empirical results in software engineering. Inf Softw Technol 40:811–821

    Article  Google Scholar 

  • Reid SC (1997) Module testing techniques—which are the most effective? In: Proceedings of Eurostar97: The Fifth European Conference on Software Testing, November, 1997

  • Roper M, Miller J, Brooks A, Wood M (1994) Towards the experimental evaluation of software testing techniques. In: Proceedings of EuroSTAR’94, October 1994, pp 1–10

  • Russel GW (1991) Experience with inspections in ultralarge-scale development. IEEE Softw 8:25–31

    Article  Google Scholar 

  • Selby RW (1986) Combining software testing strategies: an empirical evaluation. In: Proceedings of the ACM/SIGSOFT IEEE Workshop on Software Testing, Banff, July 1986, pp 82–90

  • Shull F, Basili V, Carver J, Maldonado JC, Travassos GH, Mendonca M, Fabbri S (2002) Replicating software engineering experiments: addressing the tacit knowledge problem. In: Proceedings of the 2002 International Symposium on Empirical Software Engineering (ISESE’02). IEEE Computer Society, Washington, DC, 7–16

  • Shull F, Mendoncça MG, Basili V, Carver J, Maldonado JC, Fabbri S, Travassos GH, Ferreira MC (2004) Knowledge-sharing issues in experimental software engineering. Empir Softw Eng 9:111–137

    Article  Google Scholar 

  • Sjoberg DIK, Hannay JE, Hansen O, Kampenes VB, Karahasanovic A, Liborg N-K, Rekdal AC (2005) A survey of controlled experiments in software engineering. IEEE Trans Softw Eng 31:733–753

    Article  Google Scholar 

  • So SS, Cha SD, Shimeall TJ, Kwon YR (2002) An empirical evaluation of six methods to detect faults in software. Softw Test Verif Reliab 12:155–171

    Article  Google Scholar 

  • Szyperski C (1998) Component software: beyond object-oriented programming. Addison-Wesley, Reading

    Google Scholar 

  • Tichy WF (1998) Should computer scientists experiment more? Computer 31:32–40

    Article  Google Scholar 

  • Wohlin C, Runeson P, Host M, Ohlsson M, Regnell B, Wesslen A (2000) Experimentation in software engineering. Kluwer, Norwell

    MATH  Google Scholar 

  • Wojcicki M (2006) Evaluating verification and validation technologies for concurrent components (unpublished report). University of Queensland. http://www.itee.uq.edu.au/~wojcicki. Cited 3 October 2006

  • Wojcicki M, Strooper P (2006) A state-of-practice questionnaire on verification and validation for concurrent programs. Proceedings of the 4th International Workshop on Parallel and Distributed Systems: Testing and Debugging (PADTAD), Portland, 17 July 2006, pp 1–10

  • Wood M, Roper M, Brooks A, Miller J (1997) Comparing and combining software defect detection techniques: a replicated empirical study. In: von Schauer H, Jazayeri M (eds) Proceedings of the 6th European Software Engineering Conference. Springer, Berlin Heidelberg New York, pp 262–277

  • Zar JH (1972) Significance testing of the Spearman rank correlation coefficient. J Am Stat Assoc 67:578–580

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Margaret A. Wojcicki.

Additional information

Editor: José Carlo Maldonado

Appendix

Appendix

Tables 12, 13, 14, 15.

Table 12 GQM paradigm to determine the effectiveness and efficiency metrics of the experiment
Table 13 GQM paradigm to determine the metrics of the experiment
Table 14 Raw data collected in the experiment including the number of false positives reported, number of actual defects detected, number of defects reported, rate of defect-detection, false positives that are identified as actual defects, actual defects that are identified as false positives and the total time of the defect-detection (total time can be no longer than 1 h as this was a limit set by the experimenter)
Table 15 Raw data collected in the experiment including the motivation level, self-perceived mastery of the V&V, years of experience in Java, number of courses taken with Java and number of courses taken with code inspection

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wojcicki, M.A., Strooper, P. Maximising the information gained from a study of static analysis technologies for concurrent software. Empir Software Eng 12, 617–645 (2007). https://doi.org/10.1007/s10664-007-9044-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-007-9044-6

Keywords

Navigation