Skip to main content
Log in

Testing the theory of relative defect proneness for closed-source software

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Recent studies on open-source software (OSS) products report that smaller modules are proportionally more defect prone compared to larger ones. This phenomenon, referred to as the Theory of Relative Defect Proneness (RDP), challenges the traditional QA approaches that give a higher priority to larger modules, and it attracts growing interest from closed-source software (CSS) practitioners. In this paper, we report the findings of a study where we tested the theory of RDP using ten CSS products. The results clearly confirm the theory of RDP. We also demonstrate the useful practical implications of this theory in terms of defect-detection effectiveness. Therefore, this study does not only make research contributions by rigorously testing a scientific theory for a different category of software products, but also provides useful insights and evidence to practitioners for revising their existing QA practices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. It is worth to note that the conclusions drawn in Koru et al. (2007b, 2008, 2009) were not obtained by correlating or plotting size against defect density (defect count divided by size). Such an approach is problematic and it would result in artificial ratio correlations as demonstrated in Rosenberg (1997), El Emam et al. (2002). Instead, Koru et al. simply examined the functional form of the relationship.

  2. If one wants to show that the Theory of RDP does not hold, the null hypothesis should be stated as “smaller modules are proportionally more defect prone”, and strong statistical evidence should be obtained to reject this null hypothesis safely.

  3. Note that, even though they might look similar, the comparison plots presented in Fig. 4 are different from the concentration curves shown in Fig. 2. In concentration curves, the unique module sizes are identified along the x-axis; whereas, for the comparison plots in Fig. 4, the percentiles of total size are identified along the x-axis where modules are simply sorted from left to right in the increasing order of their size.

  4. Some names are adopted from the 1726 novel of Jonathan Swift, Gulliver’s Travels.

References

  • Akiyama F (1971) An example of software system debuggings. In: Information processing 71, proceedings of IFIP congress 71, vol 1, pp 353–359

  • Andersson C, Runeson P (2007) A replicated quantitative analysis of fault distributions in complex software systems. IEEE Trans Softw Eng 33(5):273–286

    Article  Google Scholar 

  • Basili VR, Perricone BT (1984) Software errors and complexity: an empirical investigation. Commun ACM 27(1):42–52

    Article  Google Scholar 

  • Briand LC, Wüst J, Lounis H (2001) Replicated case studies for investigating quality factors in object oriented designs. J Syst Softw 56(1):11–58

    Google Scholar 

  • Briand LC, Melo WL, Wüst J (2002) Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng 28(7):706–720

    Article  Google Scholar 

  • Chayes F (1971) Ratio correlation: a manual for students of petrology and geochemistry. University of Chicago Press, Chicago, IL

    Google Scholar 

  • Cox DR (1972) Regression models and life tables. J R Stat Soc 34:187–220

    MATH  Google Scholar 

  • Crouchley R, Pickes A (1993) A specification test for univariate and multivariate proportional hazards models. Biometrics 49:1067–1076

    Article  MATH  Google Scholar 

  • El Emam K (2005) The ROI from software quality. Auerbach Publications, Taylor and Francis Group, LLC, Boca Raton, FL

  • El Emam K, Koru G (2008) A replicated survey of it software project failure rates. IEEE Softw 25(5):84–90. ISSN 0740-7459. doi:10.1109/MS.2008.107

    Article  Google Scholar 

  • El Emam K, Benlarbi S, Goel N, Rai SN (2001) The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans Softw Eng 27(7):630–650

    Article  Google Scholar 

  • El Emam K, Benlarbi S, Goel N, Melo W, Lounis H, Rai SN (2002) The optimal class size for object-oriented software. IEEE Trans Softw Eng 28(5):494–509

    Article  Google Scholar 

  • Fenton N, Pfleeger SL (1996) Software metrics: a rigorous and practical approach, 2nd edn. PWS Publishing

  • Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675–689

    Article  Google Scholar 

  • Fenton NE, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814

    Article  Google Scholar 

  • Funami Y, Halstead MH (1976) A software physics analysis of Akiyama’s debugging data. In: Proceedings of MRI XXIV international symposium on computer software engineering, pp 133–138

  • Gaffney JE (1984) Estimating the number of faults in code. IEEE Trans Softw Eng 10(4):459–465

    Article  MathSciNet  Google Scholar 

  • Halstead MH (1977) Elements of software science. Elsevier, Amsterdam, The Netherlands

    MATH  Google Scholar 

  • Hamer PG, Frewin GD (1982) M.h. halstead’s software science—a critical examination. In: ICSE ’82: proceedings of the 6th international conference on software engineering, pp 197–206

  • Harrell FE (2001) Regression modeling strategies: with applications to linear modes, logistic regression, and survival analysis. Springer, Berlin Heidelberg New York

    Google Scholar 

  • Harrell FE (2005) Design: design package. R package version 2.0-12

  • Hatton L (1997) Reexamining the fault density-component size connection. IEEE Softw 14(2):89–97

    Article  Google Scholar 

  • Hatton L (1998) Does oo sync with how we think? IEEE Softw 15(3):46–54

    Article  Google Scholar 

  • Huber PJ (1967) The behavior of maximum likelihood estimates under nonstandard conditions. In: Proceedings of the fifth Berkeley symposium in mathematical statistics, vol 1, pp 221–233. University of California Press, Berkeley, CA

  • Kakwani N, Wagstaff A, Van Doorslaer E (1997) Socioeconomic inequalities in health: Measurement, computation, and statistical inference. J Econom 77:87–103

    Article  MATH  Google Scholar 

  • Khoshgoftaar TM, Szabo RM (1996) Using neural networks to predict software faults during testing. IEEE Trans Reliab 45(3):456–462

    Article  Google Scholar 

  • Khoshgoftaar TM, Allen EB, Hudepohl J, Aud SJ (1997) Applications of neural networks to software quality modeling of a very large telecommunications system. IEEE Trans Neural Netw 8(4):902–909

    Article  Google Scholar 

  • Koru G, El Emam K, Neisa A, Umarji M (2007a) A survey of quality assurance practices in biomedical open-source software projects. J Med Internet Res 9(2). doi:10.2196/jmir.9.2.e8. http://www.jmir.org/2007/2/e8

  • Koru G, Zhang D, Liu H (2007b) Modeling the effect of size on defect proneness for open-source software. In: PROMISE ’07: proceedings of the third international workshop on predictor models in software engineering. IEEE Computer Society, Washington, DC, USA, pp 115–124. ISBN 0-7695-2954-2. doi:10.1109/PROMISE.2007.9

  • Koru G, Zhang D, El Emam K, Liu H (2009) An investigation into the functional form of the size-defect relationship for software modules. IEEE Trans Softw Eng 35(2):293–304. ISSN 0098-5589. doi:10.1109/TSE.2008.90

    Article  Google Scholar 

  • Koru G, El Emam K, Zhang D, Liu H, Mathew D (2008) Theory of relative defect proneness. Empir Softw Eng 13(5):473–498. ISSN 1382-3256. doi:10.1007/s10664-008-9080-x

    Article  Google Scholar 

  • Lindsay RM, Ehrenberg ASC (1993) The design of replicated studies. Am Stat 47(3):217–228

    Article  Google Scholar 

  • Lipow M (1982) Number of faults per line of code. IEEE Trans Softw Eng 8(4):437–439

    Article  Google Scholar 

  • Mockus A, Fielding RT, Herbsleb J (2000) A case study of open source software development: the apache server. In: Proceedings of the 22nd international conference on software engineering, ICSE 2000, pp 263–272

  • Mockus A, Fielding RT, Herbsleb J (2002) Two case studies of open source software development: apache and mozilla. ACM Trans Softw Eng Methodol 11(3):309–346

    Article  Google Scholar 

  • NASA IV&V Facility (2009, archival time) Metrics data program. Archived at http://www.webcitation.org/5fzQE6wVI

  • Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355

    Article  Google Scholar 

  • Ottenstein LM (1979) Quantitative estimates of debugging requirements. IEEE Trans Softw Eng 5(5):504–514

    Article  Google Scholar 

  • Popper K (1959) The logic of scientific discovery. Routledge classics, 2nd edn. Reprint, 2007

  • Porter A, Votta L (1998) Comparing detection methods for software requirements inspections: a replication using professional subjects. Empir Softw Eng 3(4):355–379

    Article  Google Scholar 

  • Porter AA, Selby RW (1990) Empirically guided software development using metric-based classification trees. IEEE Softw 7(2):46–54

    Article  Google Scholar 

  • Promise (2007) Promise data repository

  • Raymond ES (1999) The cathedral and the bazaar: musings on linux and open source by an accidental revolutionary. O’Reilly and Associates, Sebastopol, CA, 95472, USA

  • Rosenberg J (1997) Some misconceptions about lines of code. In: METRICS ’97: proceedings of the 4th international symposium on software metrics. IEEE Computer Society, Washington, DC, USA, pp 137–142

  • Selby RW, Basili VR (1991) Analyzing error-prone system structure. IEEE Trans Softw Eng 17(2):141–152

    Article  Google Scholar 

  • Shen VY, Yu TJ, Thebaut SM, Paulsen L (1985) Identifying error-prone software—an empirical study. IEEE Trans Softw Eng 11(4):317–324

    Article  Google Scholar 

  • Shull F, Carver J, Travassos GH, Maldonado JC, Conradi R, Basili VR (2003) Replicated studies: building a body of knowledge about software reading techniques. In: Lecture notes on empirical software engineering. World Scientific Publishing Co, Inc, pp 39–84

  • Thayer R, Lipow M, Nelson E (1978) Software reliability. North-Holland

  • Therneau TM, Grambsch PM (2000) Modeling survival data: extending the Cox model. Springer, Berlin Heidelberg New York

    MATH  Google Scholar 

  • Wagstaff A, Paci P, van Doorslaer E (1991) On the measurement of inequalities in health. Soc Sci Med (1982) 33(5):545–57

    Article  Google Scholar 

  • Zhao L, Elbaum S (2003) Quality assurance under the open source development model. J Syst Softw 66(1):65–75

    Google Scholar 

Download references

Acknowledgements

We would like to thank the creators, contributors, and maintainers of the PROMISE and NASA repositories, especially Tim Menzies and Justin Di Stefano, for the clarifications they made about some of the data sets.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gunes Koru.

Additional information

Editor: Nachiappan Nagappan

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koru, G., Liu, H., Zhang, D. et al. Testing the theory of relative defect proneness for closed-source software. Empir Software Eng 15, 577–598 (2010). https://doi.org/10.1007/s10664-010-9132-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-010-9132-x

Keywords

Navigation