Skip to main content
Log in

On modeling software defect repair time

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

The ability to predict the time required to repair software defects is important for both software quality management and maintenance. Estimated repair times can be used to improve the reliability and time-to-market of software under development. This paper presents an empirical approach to predicting defect repair times by constructing models that use well-established machine learning algorithms and defect data from past software defect reports. We describe, as a case study, the analysis of defect reports collected during the development of a large medical software system. Our predictive models give accuracies as high as 93.44%, despite the limitations of the available data. We present the proposed methodology along with detailed experimental results, which include comparisons with other analytical modeling approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Boehm B, Basili V (2001) Software defect reduction top 10 list, Software Management. pp 135–137

  • Boehm B, Horowitz E, Madachy R, Reifer D, Clark B, Steece B, Brown A, Chulani S, Abts C (2000) Software cost estimation with COCOMO II. Englewood Cliffs, Prentice-Hall

    Google Scholar 

  • Challagulla V, Bastani F, Yen I, Paul R (2005) Empirical assessment of machine learning based software defect prediction techniques. In Proceedings of the 10th IEEE International Workshop on Object-oriented Real-time Dependable Systems

  • Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge, UK

    Google Scholar 

  • Culbertson R, Brown C, Cobb G (2001) Rapid Testing. Prentice Hall, Upper Saddle River, NJ

    Google Scholar 

  • Fenton N, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675–689

    Article  Google Scholar 

  • Fenton N, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814

    Article  Google Scholar 

  • Furey T, Christianini N, Duffy N, Bednarski D, Schummer M, Hauessler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914

    Article  Google Scholar 

  • Galen R (2005) Software endgames: eliminating defects, controlling change, and the countdown to on-time delivery. Dorset House Publishing, NY

    Google Scholar 

  • Gokhale S, Mullen R (2006) Queuing models for field defect resolution process. In Proceedings of the 17th IEEE International Symposium on Software Reliability Engineering (ISSRE-06). Raleigh, NC

  • Han J, Kamber M (2000) Data mining: concepts and techniques. Morgan Kaufmann, San Mateo, CA

    Google Scholar 

  • Haykin S (1995) Neural networks: a Comprehensive Foundation, 2nd edn. Springer, New York

    Google Scholar 

  • Hewett R, Kulkarni A (2006) Alternative approach to utilize software defect reports. In Proceedings of the 15th International Conference on Software Engineering and Data Engineering (SEDE-2006). Los Angeles, CA

  • Hewett R, Kulkarni A, Stringfellow C, Andrews A (2006) Software defect data and predictability for testing schedules. In Proceedings of the 18th International Conference on Software Engineering and Knowledge Engineering, San Francisco, CA

  • Huang J, Ling CX (1995) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310 2005

    Article  Google Scholar 

  • John G, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In Proc. of the 11th International Conference on Machine Learning, pp 121–129

  • Khoshgoftaar T, Szabo R, Woodcock T (1994) An empirical study of program quality during testing and maintenance. Softw Qual J 3:137–151

    Article  Google Scholar 

  • Kleinrock L (1976) Queueing systems, Volume II: computer applications. Wiley, New York

    Google Scholar 

  • Kohavi R (1995) The power of decision tables. In Proceedings of European Conference on Machine Learning. Springer-Verlag

  • Lapin L (1973) Statistics for modern business decisions. Harcourt Brace Jovanovich, Orlando, FL

    MATH  Google Scholar 

  • Larson R, Odoni A (2007) Urban operations research (Prentice-Hall, NJ, 1981). Dynamic Ideas, Belmont, MA

    Google Scholar 

  • Li P, Shaw M, Herbsleb J (2003) Selecting a defect prediction model for maintenance resource planning and software insurance. Proceedings of the Fifth Workshop on Economics-Driven Software Research, IEEE Computer Society, pp 32–37

  • Malaiya Y, Karunanithi N, Verma P (1992) Predictability of software reliability models. IEEE Trans Reliab 41(4):539–546

    Article  MATH  Google Scholar 

  • Menzies T, Dekhtyar A, Distefano J, Greenwald J (2007) Problems with precision: a response to “Comments on ‘Data Mining Static Code Attributes to Learn Defect Predictors’”. IEEE Trans Softw Eng 33(9):637–640

    Article  Google Scholar 

  • Mitchell T (1997) Machine learning. McGraw-Hill, NY

    MATH  Google Scholar 

  • Mullen R (2006) Characterizing software defect repair time. In Proceedings of the 17th IEEE International Symposium on Software Reliability Engineering (ISSRE-06), Raleigh, NC

  • Musa J, Iannino A, Okumoto K (1987) Software reliability: measurement, prediction, application. McGraw-Hill, NY

    Google Scholar 

  • Quinlan R (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo, CA

    Google Scholar 

  • Ramaswamy S, Tamayo P, Rifkin R, Mukheriee S, Yeang C, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov J, Poggio T, Gerald W, Loda M, Lander E, Golub T (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci U S A 98:15149–15154

    Article  Google Scholar 

  • Schach S (1996) Testing: principles and practices. ACM Comput Surv 28(1):277–279

    Article  MathSciNet  Google Scholar 

  • Schneidewind N (2001) Modeling the fault correction process. In Proceedings of International Symposium on Software Reliability Engineering (ISSRE 2001), pp 185–190

  • Stringfellow C, Andrews A (2001) Quantitative analysis of development defects to guide testing. Softw Qual J 9(3):195–214

    Article  Google Scholar 

  • Stringfellow C, Andrews A (2002) An empirical method for selecting software reliability growth models. Empirical Software Engineering 7(4):319–343

    Article  MATH  Google Scholar 

  • Vapnik V (1998) Statistical learning theory. Wiley-Interscience, New York

    MATH  Google Scholar 

  • Witten I, Frank E (2005) Data mining practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, CA

    MATH  Google Scholar 

  • Yamada S, Ohba M, Osaki S (1983) S-shaped reliability growth modeling for software error decision. IEEE Trans Reliab 32:475–478

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the referees and John Leuchner for their helpful comments, which have improved the quality of this paper. Special thanks to Anneliese Andrews and Catherine Stringfellow for providing the case study data used to illustrate the approach.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rattikorn Hewett.

Additional information

Editor: Tim Menzies

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hewett, R., Kijsanayothin, P. On modeling software defect repair time. Empir Software Eng 14, 165–186 (2009). https://doi.org/10.1007/s10664-008-9064-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-008-9064-x

Keywords

Navigation