Skip to main content
Log in

Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

High-assurance and complex mission-critical software systems are heavily dependent on reliability of their underlying software applications. An early software fault prediction is a proven technique in achieving high software reliability. Prediction models based on software metrics can predict number of faults in software modules. Timely predictions of such models can be used to direct cost-effective quality enhancement efforts to modules that are likely to have a high number of faults. We evaluate the predictive performance of six commonly used fault prediction techniques: CART-LS (least squares), CART-LAD (least absolute deviation), S-PLUS, multiple linear regression, artificial neural networks, and case-based reasoning. The case study consists of software metrics collected over four releases of a very large telecommunications system. Performance metrics, average absolute and average relative errors, are utilized to gauge the accuracy of different prediction models. Models were built using both, original software metrics (RAW) and their principle components (PCA). Two-way ANOVA randomized-complete block design models with two blocking variables are designed with average absolute and average relative errors as response variables. System release and the model type (RAW or PCA) form the blocking variables and the prediction technique is treated as a factor. Using multiple-pairwise comparisons, the performance order of prediction models is determined. We observe that for both average absolute and average relative errors, the CART-LAD model performs the best while the S-PLUS model is ranked sixth.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Beaumont, G. P. 1996. Statistical Tests: An Introduction with Minitab Commentary. Prentice Hall.

  • Berenson, M. L., Levine, D. M., and Goldstein, M. 1983. Intermediate Statistical Methods and Applications: A Computer Package Approach. Englewood Cliffs, NJ, USA: Prentice Hall.

    Google Scholar 

  • Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. 1984. Classification And Regression Trees. Belmont, California, USA: Wadsworth International Group, 2nd edition.

    Google Scholar 

  • Briand, L. C., Basili, V. R., and Hetmanski, C. J. 1993. Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Transactions on Software Engineering 19(11): 1028–1044.

    Google Scholar 

  • Briand, L. C., Langley, T., and Wieczorek, I. 2000. A replicated assessment and comparison of common software cost modeling techniques. In Proceedings: International Conference on Software Engineering. Limerick, Ireland, 377–386.

  • Clark, L. A., and Pregibon, D. 1992. Tree-based models. In J. M. Chambers and T. J. Hastie (eds.): Statistical Models in S. Pacific Grove, California: Wadsworth International Group, pp. 377–419.

    Google Scholar 

  • Fenton, N. E., and Pfleeger, S. L. 1997. Software Metrics: A Rigorous and Practical Approach, second edition, Boston, MA, USA: PWS Publishing Company: ITP.

    Google Scholar 

  • Finnie, G. R., Wittig, G. E., and Desharnais, J. M. 1997. A comparison of software effort estimation techniques: using function points with neural networks, case-based reasoning, and regression models. Journal of Systems and Software 39: 281–289.

    Google Scholar 

  • Ganesan, K., Khoshgoftaar, T. M., and Allen, E. B. 2000. Case-based software quality prediction. International Journal of Software Engineering and Knowledge Engineering 10(2): 139–152. World Scientific Publishing.

    Google Scholar 

  • Gokhale, S. S., and Lyu, M. R. 1997. Regression tree modeling for the prediction of software quality. In H. Pham (ed.): Proceedings: 3rd International Conference on Reliability and Quality in Design. Anaheim, California, USA, 31–36.

  • Gray, A. R., and MacDonell, S. G. 1999. Software metrics data analysis: exploring the relative performance of some commonly used modeling techniques. Empirical Software Engineering 4: 297–316.

    Google Scholar 

  • Hudepohl, J. P., Aud, S. J., Khoshgoftaar, T. M., Allen, E. B., and Mayrand, J. 1996. Emerald: Software metrics and models on the desktop. IEEE Software 13(5): 56–60.

    Google Scholar 

  • Jones, W. D., Hudepohl, J. P., Khoshgoftaar, T. M., and Allen, E. B. 1999. Application of a usage profile in software quality models. In Proceedings: 3rd European Conference on Software Maintenance and Reengineering. Amsterdam, Netherlands, 148–157.

  • Khoshgoftaar, T. M., and Allen, E. B. 2001. Modeling software quality with classification trees. In H. Pham (ed.): Recent Advances in Reliability and Quality Engineering. Singapore: World Scientific Publishing, Chapter 15, 247–270.

    Google Scholar 

  • Khoshgoftaar, T. M., Allen, E. B., and Busboom, J. C. 2000a. Modeling software quality: the software measurement analysis and reliability toolkit. In Proceedings: 12th International Conference on Tools with Artificial Intelligence. Vancouver, BC Canada, 54–61.

  • Khoshgoftaar, T. M., Allen, E. B., and Deng, J. 2001. Controlling overfitting in software quality models: experiments with regression trees and classification. In Proceedings: 7th International Software Metrics Symposium. London UK, 190–198.

  • Khoshgoftaar, T. M., Allen, E. B., Jones, W. D., and Hudepohl, J. P. 2000b. Accuracy of software quality models over multiple releases. Annals of Software Engineering 9(1–4): 103–116.

    Google Scholar 

  • Khoshgoftaar, T. M., Allen, E. B., and Shan, R. 2000c. Improving tree-based models of software quality with principal components analysis. In Proceedings of the Eleventh International Symposium on Software Reliability Engineering. San Jose, California, USA, 198–209.

  • Khoshgoftaar, T. M., and Lanning, D. L. 1995. A neural network approach for early detection of pro-gram modules having high risk in the maintenance phase. Journal of Systems and Software 29(1): 85–91.

    Google Scholar 

  • Khoshgoftaar, T. M., Munson, J. C., Bhattacharya, B. B., and Richardson, G. D., 1992. Predictive modeling techniques of software quality from software measures. IEEE Transactions on Software Engineering 18(11):979–987.

    Google Scholar 

  • Khoshgoftaar, T. M., and Seliya, N. 2002. Tree-based software quality models for fault prediction. In Proceedings: 8th International Software Metrics Symposium. Ottawa, Ontario, Canada, 203–214.

  • Kolodner, J. 1993. Case-Based Reasoning. San Mateo, California, USA: Morgan Kaufmann Publishers, Inc.

    Google Scholar 

  • Lin, C. T., and Lee, C. S. G. 1996. Neural Fuzzy Systems: A Neuro-Fuzzy Synergism to Intelligent Systems. Upper Saddle River, NJ, USA: Prentice Hall Inc.

    Google Scholar 

  • Lippmann, R. P. 1987. An introduction to computing withneural networks. Acoustics, Speech and Signal Processing Magazine 4(2): 4–22.

    Google Scholar 

  • Minsky, M., and Papert, S. 1969. Perceptrons. MA, USA: MIT Press.

    Google Scholar 

  • Neter, J., Kutner, M. H., Nachtsheim, C. J., and Wasserman, W. 1996. Applied Linear Statistical Models. Tom Casson.

  • Nielsen, R. H. 1987. Counter propagation network. Applied Optics Journal 26(23).

  • Ohlsson, M. C., and Runeson, P. 2002. Experience from replicating empirical studies on prediction models. In Proceedings: 8th International Software Metrics Symposium. Ottawa, Ontario, Canada, 217–226.

  • Rosenblatt, F. 1962. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. New York, NY, USA: Spartan Books.

    Google Scholar 

  • Rumelhart, D. E., Hinton, G. E., and Williams, R. 1962. Parallel Distributed Processing, Vol. 1. Cambridge, MA, USA: MIT Press.

    Google Scholar 

  • Schneidewind, N. F. 1995. Software metrics validation: space shuttle flight software example. Annals of Software Engineering 1: 287–309.

    Google Scholar 

  • Schneidewind, N. F. 1997. Software metrics model for integrating quality control and prediction. In Proceedings: 8th International Symposium on Software Reliability Engineering. Albuquerque, NM USA, 402–415.

  • Seliya, N. 2001. Software fault prediction using tree-based models. Master's thesis, Florida Atlantic University, Boca Raton, FL USA. Advised by T. M. Khoshgoftaar.

    Google Scholar 

  • Sundaresh, N. 2001. An empirical study of analogy based software fault prediction. Master's thesis, Florida Atlantic University, Boca Raton, FL USA. Advised by Taghi M. Khoshgoftaar.

    Google Scholar 

  • Takahashi, R., Muraoka, Y., and Nakamura, Y. 1997. Building software quality classification trees: approach, experimentation, evaluation. In Proceedings: 8th International Symposium on Software Reliability Engineering. Albuquerque, NM, USA, 222–233.

  • Troster, J., and J. Tian 1995. Measurement and defect modeling for a legacy software system. Annals of Software Engineering 1: 95–118.

    Google Scholar 

  • Xu, Z. 2001. Fuzzy logic techniques for software reliability engineering. Ph.D. thesis, Florida Atlantic University, Boca Raton, Florida USA. Advised by Taghi M. Khoshgoftaar.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khoshgoftaar, T.M., Seliya, N. Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques. Empirical Software Engineering 8, 255–283 (2003). https://doi.org/10.1023/A:1024424811345

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1024424811345

Navigation