Maximising the information gained from a study of static analysis technologies for concurrent software

Wojcicki, Margaret A.; Strooper, Paul

doi:10.1007/s10664-007-9044-6

Maximising the information gained from a study of static analysis technologies for concurrent software

Published: 04 August 2007

Volume 12, pages 617–645, (2007)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Margaret A. Wojcicki¹ &
Paul Strooper¹

193 Accesses
5 Citations
Explore all metrics

Abstract

The results of empirical studies in Software Engineering are limited to particular contexts, difficult to generalise and the studies themselves are expensive to perform. Despite these problems, empirical studies can be made effective and they are important to both researchers and practitioners. The key to their effectiveness lies in the maximisation of the information that can be gained by examining and replicating existing studies and using power analyses for an accurate minimum sample size. This approach was applied in a controlled experiment examining the combination of automated static analysis tools and code inspection in the context of the verification and validation (V&V) of concurrent Java components. The paper presents the results of this controlled experiment and shows that the combination of automated static analysis and code inspection is cost-effective. Throughout the experiment a strategy to maximise the information gained from the experiment was used. As a result, despite the size of the study, conclusive results were obtained, contributing to the research on V&V technology evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Can static analysis tools find more defects?

Article 08 November 2022

Are automated static analysis tools worth it? An investigation into relative warning density and external software quality on the example of Apache open source projects

Article Open access 17 April 2023

A metrics suite for JUnit test code: a multiple case study on open source software

Article Open access 30 December 2014

References

Andrews G (1991) Concurrent programming: principles and practice. Addison-Wesley, Reading
Artho C (2001) Finding faults in multi-threaded programs. Masters thesis: Federal Institute of Technology, Zurich–Austin
Basili VR, Selby RW (1987) Comparing the effectiveness of software testing strategies. IEEE Trans Softw Eng 13:1278–1296
Article Google Scholar
Basili VR, Selby RW, Hutchens DH (1986) Experimentation in software engineering. IEEE Trans Softw Eng 12:733–743
Google Scholar
Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25:456–473
Article Google Scholar
Brat G, Drusinsky D, Giannakopoulou D, Goldberg A, Havelund K, Lowry M, Pasareanu C, Venet A, Visser W, Washington R (2004) Experimental evaluation of verification and validation tools on Martian Rover software. Form Methods Syst Des 25:167–198
Article MATH Google Scholar
Briand LC, Penta MD, Labiche Y (2004) Assessing and improving state-based class testing: a series of experiments. IEEE Trans Softw Eng 30:770–793
Article Google Scholar
Carver RH, Tai K-C (1998) Use of sequencing constraints for specification-based testing of concurrent programs. IEEE Trans Softw Eng 24:471–490
Article Google Scholar
Carver J, Voorhis JV, Basili V (2004) Understanding the impact of assumptions on experimental validity. In: Proceedings of the 2004 International Symposium on Empirical Software Engineering, Redondo Beach, 19–20 August 2004, pp 251–260
Coakes SJ, Steed LG (2001) SPSS: analysis without anguish (version 10.0 for Windows). Wiley, New York
Cohen J (1988) Statistical power analysis for the behavioural sciences. Lawrence Erlbaum, Hillsdale
Google Scholar
Corbett JC (1996) Evaluating deadlock detection methods for concurrent software. IEEE Trans Softw Eng 22:161–180
Article Google Scholar
Do H, Elbaum S, Rothermel G (2005) Supporting controlled experimentation with testing techniques: an infrastructure and its potential impact. Empir Softw Eng 10:405–435
Article Google Scholar
Dyba T, Kampenes VB, Sjøberg DIK (2006) A systematic review of statistical power in software engineering experiments. J Infor Softw Technol 48:745–755
Article Google Scholar
Endres A, Rombach D (2003) A handbook of software and systems engineering. Addison-Wesley, Reading
Eytani Y, Havelund K, Stoller SD, Ur S (2007) Toward a framework and benchmark for testing tools for multi-threaded programs. Concurr Comput Pract Exp 19:267–279
Article Google Scholar
Flanagan C, Freund SN (2004) Atomizer: a dynamic atomicity checker for multithreaded programs. In: Proceedings of the 31st ACM SIGPLAN-SIGACT. ACM, New York, pp 256–267
Frankl PG, Weiss SN (1993) An experimental comparison of the effectiveness of branch testing and data flow testing. IEEE Trans Softw Eng 19:774–787
Article Google Scholar
Girgis MR, Woodward MR (1986) An Experimental Comparision of the Error Exposing Ability of Program Testing Criteria. In: Proceedings of the Workshop on Software Testing. IEEE, Los Alamitos, pp 64–73
Hallal HH, Alikacem E, Tunney WP, Boroday S, Petrenko A (2004) Antipattern-based detection of deficiencies in Java multithreaded software. In: Proceedings of the 4th International Conference on Quality Software (QSIC), Braunschweig, 8–10 September 2004, 258–267
Havelund K, Pressburger T (2000) Model checking Java programs using Java PathFinder. Int J Softw Tools Technol Transf 2:366–381
Article MATH Google Scholar
Hetzel WC (1976) An experimental analysis of program verification methods. Ph.D. thesis, University of North Carolina
Hovemeyer D, Pugh W (2004) Finding Concurrency bugs in Java. In: Proceedings of the 23rd Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC 2004) Workshop on Concurrency and Programs. http://www.cs.umd.edu/~daveho/research/csjp2004.pdf. Cited 12 April 2005
Howell DC (1997) Statistical methods for psychology. Wadsworth, Belmont
Google Scholar
Jedlitschka A, Pfahl D (2005) Reporting guidelines for controlled experiments in software engineering. In: Proceedings of the 2005 International Symposium on Empirical Software Engineering (ISESE’05), Noosa Heads, 17–18 November 2005, pp 95–104
Jeffery R, Scott L (2002) Has twenty-five years of empirical software engineering made a difference? In: Proceedings of the Ninth Asia-Pacific Software Engineering Conference (APSEC’02). IEEE Computer Society, Washington, DC, pp 539–546
Juristo N, Moreno AM (2001) Basics of software engineering experimentation. Kluwer, Norwell
MATH Google Scholar
Juristo N, Moreno AM, Vegas S (2002) A survey on testing technique empirical studies: how limited is our knowledge. In: Proceedings of the 2002 International Symposium on Empirical Software Engineering (ISESE’02). IEEE Computer Society, Washington, DC, 161–172
Kamsties E, Lott CM (1995) An empirical evaluation of three defect-detection techniques. In: Proceedings of the Fifth European Software Engineering Conference, Barcelona, September 1995, pp 1–22
Kim S, Clark JA, McDermid JA (2000) Class mutation: mutation testing for object-oriented programs. Proc. Net.ObjectDays, Erfurt, Germany, http://www-users.cs.york.ac.uk/~jac/papers/ClassMutation.pdf. Cited 5 May 2005.
Kim S, Clark JA, McDermid JA (2001) Investigating the effectiveness of object-oriented testing strategies with the mutation method. Softw Test Verif Reliab 11:207–225
Article Google Scholar
Kitchenham BA (2001) The case against software benchmarking, keynote lecture. In: Proceedings of The European Software Measurement Conference (FESMA-DASMA 2001), Heidelberg, May 2001, pp 1–9
Kitchenham BA (2004) Procedures for performing systematic reviews. Keele University TR/SE-0401, July 2004
Kitchenham BA, Pfleeger SL, Fenton N (1995) Towards a framework for software measurement validation. IEEE Trans Softw Eng 21:929–944
Article Google Scholar
Kitchenham BA, Pfleeger SL, Pickard L, Jones P, Hoaglin D, Emam KE, Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. IEEE Trans Softw Eng 28:721–734
Article Google Scholar
Kitchenham BA, Linkman SG, Fry JS (2003) Experimenter induced distortions in empirical software engineering. In: Proceedings of 2nd International Workshop on Empirical Software Engineering (WSESE), Roman Castles, September 2003, pp 7–15
Lea D (2005) Overview of package util.concurrent Release 1.3.4. http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html. Cited 12 April 2005
Lindsay RM, Ehrendberg ASC (1993) The design of replicated studies. Am Stat 47:217–228
Article Google Scholar
Long B, Strooper P, Hoffman D (2003) Tool support for testing concurrent Java components. IEEE Trans Softw Eng 29:555–566
Article Google Scholar
Long B, Duke R, Goldson D, Strooper P, Wildman L (2004) Mutation-based evaluation of a method for verifying concurrent Java components. In: Proceedings of the 2nd International Workshop on Parallel and Distributed Systems: Testing and Debugging (PADTAD), Santa Fe, 30 April 2004, p 265b
Long B, Strooper P, Wildman L (2007) A method for verifying concurrent Java components. Concurr Comput Pract Exp 19:281–294
Article Google Scholar
Lott CM (2005) Comparing reading and testing techniques. http://www.chris-lott.org/work/exp/. Cited 12 August 2005
Lott CM, Rombach HD (1996) Repeatable software engineering experiments for comparing defect-detection techniques. Empir Softw Eng 1:241–277
Article Google Scholar
Magee J, Kramer J (1999) Concurrency: state models & Java programs. Wiley, New York
MATH Google Scholar
Miller I, Freund JE (1965) Probability and statistics for engineers. Prentice-Hall, Englewood Cliffs
Google Scholar
Myers G (1978) A controlled experiment in program testing and code walkthroughs/inspections. Commun ACM 21:760–768
Article Google Scholar
Ngui J, Strooper P, Wildman L, Wojcicki M (2007) Comparing the cost-effectiveness of statically analysing and model checking concurrent Java components for deadlocks. In: Australian Software Engineering Conference (ASWEC ’07), Melbourne, 10–13 April 2007, pp 223–232
Novillo E, Lu P (2003) A case study of selected SPLASH-2 applications and the SBT debugging tool. In: Proceedings of the 1st International Workshop on Parallel and Distributed Systems: Testing and Debugging (PADTAD). IEEE Computer Society, Washington, DC, p 290b
Parnas DL (2003) The limits of empirical studies of software engineering. In: Proceedings of the 2003 International Symposium on Empirical Software Engineering (ISESE’03). IEEE Computer Society, Washington, DC, pp 2–5
Pickard LM, Kitchenham BA, Jones P (1998) Combining empirical results in software engineering. Inf Softw Technol 40:811–821
Article Google Scholar
Reid SC (1997) Module testing techniques—which are the most effective? In: Proceedings of Eurostar97: The Fifth European Conference on Software Testing, November, 1997
Roper M, Miller J, Brooks A, Wood M (1994) Towards the experimental evaluation of software testing techniques. In: Proceedings of EuroSTAR’94, October 1994, pp 1–10
Russel GW (1991) Experience with inspections in ultralarge-scale development. IEEE Softw 8:25–31
Article Google Scholar
Selby RW (1986) Combining software testing strategies: an empirical evaluation. In: Proceedings of the ACM/SIGSOFT IEEE Workshop on Software Testing, Banff, July 1986, pp 82–90
Shull F, Basili V, Carver J, Maldonado JC, Travassos GH, Mendonca M, Fabbri S (2002) Replicating software engineering experiments: addressing the tacit knowledge problem. In: Proceedings of the 2002 International Symposium on Empirical Software Engineering (ISESE’02). IEEE Computer Society, Washington, DC, 7–16
Shull F, Mendoncça MG, Basili V, Carver J, Maldonado JC, Fabbri S, Travassos GH, Ferreira MC (2004) Knowledge-sharing issues in experimental software engineering. Empir Softw Eng 9:111–137
Article Google Scholar
Sjoberg DIK, Hannay JE, Hansen O, Kampenes VB, Karahasanovic A, Liborg N-K, Rekdal AC (2005) A survey of controlled experiments in software engineering. IEEE Trans Softw Eng 31:733–753
Article Google Scholar
So SS, Cha SD, Shimeall TJ, Kwon YR (2002) An empirical evaluation of six methods to detect faults in software. Softw Test Verif Reliab 12:155–171
Article Google Scholar
Szyperski C (1998) Component software: beyond object-oriented programming. Addison-Wesley, Reading
Google Scholar
Tichy WF (1998) Should computer scientists experiment more? Computer 31:32–40
Article Google Scholar
Wohlin C, Runeson P, Host M, Ohlsson M, Regnell B, Wesslen A (2000) Experimentation in software engineering. Kluwer, Norwell
MATH Google Scholar
Wojcicki M (2006) Evaluating verification and validation technologies for concurrent components (unpublished report). University of Queensland. http://www.itee.uq.edu.au/~wojcicki. Cited 3 October 2006
Wojcicki M, Strooper P (2006) A state-of-practice questionnaire on verification and validation for concurrent programs. Proceedings of the 4th International Workshop on Parallel and Distributed Systems: Testing and Debugging (PADTAD), Portland, 17 July 2006, pp 1–10
Wood M, Roper M, Brooks A, Miller J (1997) Comparing and combining software defect detection techniques: a replicated empirical study. In: von Schauer H, Jazayeri M (eds) Proceedings of the 6th European Software Engineering Conference. Springer, Berlin Heidelberg New York, pp 262–277
Zar JH (1972) Significance testing of the Spearman rank correlation coefficient. J Am Stat Assoc 67:578–580
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology & Electrical Engineering, The University of Queensland, Brisbane, Queensland, 4072, Australia
Margaret A. Wojcicki & Paul Strooper

Authors

Margaret A. Wojcicki
View author publications
You can also search for this author in PubMed Google Scholar
Paul Strooper
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Margaret A. Wojcicki.

Additional information

Editor: José Carlo Maldonado

Appendix

Tables 12, 13, 14, 15.

Table 12 GQM paradigm to determine the effectiveness and efficiency metrics of the experiment

Full size table

Table 13 GQM paradigm to determine the metrics of the experiment

Full size table

Table 14 Raw data collected in the experiment including the number of false positives reported, number of actual defects detected, number of defects reported, rate of defect-detection, false positives that are identified as actual defects, actual defects that are identified as false positives and the total time of the defect-detection (total time can be no longer than 1 h as this was a limit set by the experimenter)

Full size table

Table 15 Raw data collected in the experiment including the motivation level, self-perceived mastery of the V&V, years of experience in Java, number of courses taken with Java and number of courses taken with code inspection

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wojcicki, M.A., Strooper, P. Maximising the information gained from a study of static analysis technologies for concurrent software. Empir Software Eng 12, 617–645 (2007). https://doi.org/10.1007/s10664-007-9044-6

Download citation

Received: 30 November 2006
Accepted: 20 June 2007
Published: 04 August 2007
Issue Date: December 2007
DOI: https://doi.org/10.1007/s10664-007-9044-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maximising the information gained from a study of static analysis technologies for concurrent software

Abstract

Access this article

Similar content being viewed by others

Can static analysis tools find more defects?

Are automated static analysis tools worth it? An investigation into relative warning density and external software quality on the example of Apache open source projects

A metrics suite for JUnit test code: a multiple case study on open source software

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Maximising the information gained from a study of static analysis technologies for concurrent software

Abstract

Access this article

Similar content being viewed by others

Can static analysis tools find more defects?

Are automated static analysis tools worth it? An investigation into relative warning density and external software quality on the example of Apache open source projects

A metrics suite for JUnit test code: a multiple case study on open source software

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation