Abstract
The ability to predict the time required to repair software defects is important for both software quality management and maintenance. Estimated repair times can be used to improve the reliability and time-to-market of software under development. This paper presents an empirical approach to predicting defect repair times by constructing models that use well-established machine learning algorithms and defect data from past software defect reports. We describe, as a case study, the analysis of defect reports collected during the development of a large medical software system. Our predictive models give accuracies as high as 93.44%, despite the limitations of the available data. We present the proposed methodology along with detailed experimental results, which include comparisons with other analytical modeling approaches.
Similar content being viewed by others
References
Boehm B, Basili V (2001) Software defect reduction top 10 list, Software Management. pp 135–137
Boehm B, Horowitz E, Madachy R, Reifer D, Clark B, Steece B, Brown A, Chulani S, Abts C (2000) Software cost estimation with COCOMO II. Englewood Cliffs, Prentice-Hall
Challagulla V, Bastani F, Yen I, Paul R (2005) Empirical assessment of machine learning based software defect prediction techniques. In Proceedings of the 10th IEEE International Workshop on Object-oriented Real-time Dependable Systems
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge, UK
Culbertson R, Brown C, Cobb G (2001) Rapid Testing. Prentice Hall, Upper Saddle River, NJ
Fenton N, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675–689
Fenton N, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814
Furey T, Christianini N, Duffy N, Bednarski D, Schummer M, Hauessler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914
Galen R (2005) Software endgames: eliminating defects, controlling change, and the countdown to on-time delivery. Dorset House Publishing, NY
Gokhale S, Mullen R (2006) Queuing models for field defect resolution process. In Proceedings of the 17th IEEE International Symposium on Software Reliability Engineering (ISSRE-06). Raleigh, NC
Han J, Kamber M (2000) Data mining: concepts and techniques. Morgan Kaufmann, San Mateo, CA
Haykin S (1995) Neural networks: a Comprehensive Foundation, 2nd edn. Springer, New York
Hewett R, Kulkarni A (2006) Alternative approach to utilize software defect reports. In Proceedings of the 15th International Conference on Software Engineering and Data Engineering (SEDE-2006). Los Angeles, CA
Hewett R, Kulkarni A, Stringfellow C, Andrews A (2006) Software defect data and predictability for testing schedules. In Proceedings of the 18th International Conference on Software Engineering and Knowledge Engineering, San Francisco, CA
Huang J, Ling CX (1995) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310 2005
John G, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In Proc. of the 11th International Conference on Machine Learning, pp 121–129
Khoshgoftaar T, Szabo R, Woodcock T (1994) An empirical study of program quality during testing and maintenance. Softw Qual J 3:137–151
Kleinrock L (1976) Queueing systems, Volume II: computer applications. Wiley, New York
Kohavi R (1995) The power of decision tables. In Proceedings of European Conference on Machine Learning. Springer-Verlag
Lapin L (1973) Statistics for modern business decisions. Harcourt Brace Jovanovich, Orlando, FL
Larson R, Odoni A (2007) Urban operations research (Prentice-Hall, NJ, 1981). Dynamic Ideas, Belmont, MA
Li P, Shaw M, Herbsleb J (2003) Selecting a defect prediction model for maintenance resource planning and software insurance. Proceedings of the Fifth Workshop on Economics-Driven Software Research, IEEE Computer Society, pp 32–37
Malaiya Y, Karunanithi N, Verma P (1992) Predictability of software reliability models. IEEE Trans Reliab 41(4):539–546
Menzies T, Dekhtyar A, Distefano J, Greenwald J (2007) Problems with precision: a response to “Comments on ‘Data Mining Static Code Attributes to Learn Defect Predictors’”. IEEE Trans Softw Eng 33(9):637–640
Mitchell T (1997) Machine learning. McGraw-Hill, NY
Mullen R (2006) Characterizing software defect repair time. In Proceedings of the 17th IEEE International Symposium on Software Reliability Engineering (ISSRE-06), Raleigh, NC
Musa J, Iannino A, Okumoto K (1987) Software reliability: measurement, prediction, application. McGraw-Hill, NY
Quinlan R (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo, CA
Ramaswamy S, Tamayo P, Rifkin R, Mukheriee S, Yeang C, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov J, Poggio T, Gerald W, Loda M, Lander E, Golub T (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci U S A 98:15149–15154
Schach S (1996) Testing: principles and practices. ACM Comput Surv 28(1):277–279
Schneidewind N (2001) Modeling the fault correction process. In Proceedings of International Symposium on Software Reliability Engineering (ISSRE 2001), pp 185–190
Stringfellow C, Andrews A (2001) Quantitative analysis of development defects to guide testing. Softw Qual J 9(3):195–214
Stringfellow C, Andrews A (2002) An empirical method for selecting software reliability growth models. Empirical Software Engineering 7(4):319–343
Vapnik V (1998) Statistical learning theory. Wiley-Interscience, New York
Witten I, Frank E (2005) Data mining practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, CA
Yamada S, Ohba M, Osaki S (1983) S-shaped reliability growth modeling for software error decision. IEEE Trans Reliab 32:475–478
Acknowledgements
The authors would like to thank the referees and John Leuchner for their helpful comments, which have improved the quality of this paper. Special thanks to Anneliese Andrews and Catherine Stringfellow for providing the case study data used to illustrate the approach.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Tim Menzies
Rights and permissions
About this article
Cite this article
Hewett, R., Kijsanayothin, P. On modeling software defect repair time. Empir Software Eng 14, 165–186 (2009). https://doi.org/10.1007/s10664-008-9064-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-008-9064-x