Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques

Khoshgoftaar, Taghi M.; Seliya, Naeem

doi:10.1023/A:1024424811345

Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques

Published: September 2003

Volume 8, pages 255–283, (2003)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Taghi M. Khoshgoftaar¹ &
Naeem Seliya²

779 Accesses
96 Citations
Explore all metrics

Abstract

High-assurance and complex mission-critical software systems are heavily dependent on reliability of their underlying software applications. An early software fault prediction is a proven technique in achieving high software reliability. Prediction models based on software metrics can predict number of faults in software modules. Timely predictions of such models can be used to direct cost-effective quality enhancement efforts to modules that are likely to have a high number of faults. We evaluate the predictive performance of six commonly used fault prediction techniques: CART-LS (least squares), CART-LAD (least absolute deviation), S-PLUS, multiple linear regression, artificial neural networks, and case-based reasoning. The case study consists of software metrics collected over four releases of a very large telecommunications system. Performance metrics, average absolute and average relative errors, are utilized to gauge the accuracy of different prediction models. Models were built using both, original software metrics (RAW) and their principle components (PCA). Two-way ANOVA randomized-complete block design models with two blocking variables are designed with average absolute and average relative errors as response variables. System release and the model type (RAW or PCA) form the blocking variables and the prediction technique is treated as a factor. Using multiple-pairwise comparisons, the performance order of prediction models is determined. We observe that for both average absolute and average relative errors, the CART-LAD model performs the best while the S-PLUS model is ranked sixth.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Linear and non-linear bayesian regression methods for software fault prediction

Article 04 January 2022

Rohit Singh & Santosh Singh Rathore

A Machine Learning Approach to Predict Software Faults

Software Defect-Based Prediction Using Logistic Regression: Review and Challenges

References

Beaumont, G. P. 1996. Statistical Tests: An Introduction with Minitab Commentary. Prentice Hall.
Berenson, M. L., Levine, D. M., and Goldstein, M. 1983. Intermediate Statistical Methods and Applications: A Computer Package Approach. Englewood Cliffs, NJ, USA: Prentice Hall.
Google Scholar
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. 1984. Classification And Regression Trees. Belmont, California, USA: Wadsworth International Group, 2nd edition.
Google Scholar
Briand, L. C., Basili, V. R., and Hetmanski, C. J. 1993. Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Transactions on Software Engineering 19(11): 1028–1044.
Google Scholar
Briand, L. C., Langley, T., and Wieczorek, I. 2000. A replicated assessment and comparison of common software cost modeling techniques. In Proceedings: International Conference on Software Engineering. Limerick, Ireland, 377–386.
Clark, L. A., and Pregibon, D. 1992. Tree-based models. In J. M. Chambers and T. J. Hastie (eds.): Statistical Models in S. Pacific Grove, California: Wadsworth International Group, pp. 377–419.
Google Scholar
Fenton, N. E., and Pfleeger, S. L. 1997. Software Metrics: A Rigorous and Practical Approach, second edition, Boston, MA, USA: PWS Publishing Company: ITP.
Google Scholar
Finnie, G. R., Wittig, G. E., and Desharnais, J. M. 1997. A comparison of software effort estimation techniques: using function points with neural networks, case-based reasoning, and regression models. Journal of Systems and Software 39: 281–289.
Google Scholar
Ganesan, K., Khoshgoftaar, T. M., and Allen, E. B. 2000. Case-based software quality prediction. International Journal of Software Engineering and Knowledge Engineering 10(2): 139–152. World Scientific Publishing.
Google Scholar
Gokhale, S. S., and Lyu, M. R. 1997. Regression tree modeling for the prediction of software quality. In H. Pham (ed.): Proceedings: 3rd International Conference on Reliability and Quality in Design. Anaheim, California, USA, 31–36.
Gray, A. R., and MacDonell, S. G. 1999. Software metrics data analysis: exploring the relative performance of some commonly used modeling techniques. Empirical Software Engineering 4: 297–316.
Google Scholar
Hudepohl, J. P., Aud, S. J., Khoshgoftaar, T. M., Allen, E. B., and Mayrand, J. 1996. Emerald: Software metrics and models on the desktop. IEEE Software 13(5): 56–60.
Google Scholar
Jones, W. D., Hudepohl, J. P., Khoshgoftaar, T. M., and Allen, E. B. 1999. Application of a usage profile in software quality models. In Proceedings: 3rd European Conference on Software Maintenance and Reengineering. Amsterdam, Netherlands, 148–157.
Khoshgoftaar, T. M., and Allen, E. B. 2001. Modeling software quality with classification trees. In H. Pham (ed.): Recent Advances in Reliability and Quality Engineering. Singapore: World Scientific Publishing, Chapter 15, 247–270.
Google Scholar
Khoshgoftaar, T. M., Allen, E. B., and Busboom, J. C. 2000a. Modeling software quality: the software measurement analysis and reliability toolkit. In Proceedings: 12th International Conference on Tools with Artificial Intelligence. Vancouver, BC Canada, 54–61.
Khoshgoftaar, T. M., Allen, E. B., and Deng, J. 2001. Controlling overfitting in software quality models: experiments with regression trees and classification. In Proceedings: 7th International Software Metrics Symposium. London UK, 190–198.
Khoshgoftaar, T. M., Allen, E. B., Jones, W. D., and Hudepohl, J. P. 2000b. Accuracy of software quality models over multiple releases. Annals of Software Engineering 9(1–4): 103–116.
Google Scholar
Khoshgoftaar, T. M., Allen, E. B., and Shan, R. 2000c. Improving tree-based models of software quality with principal components analysis. In Proceedings of the Eleventh International Symposium on Software Reliability Engineering. San Jose, California, USA, 198–209.
Khoshgoftaar, T. M., and Lanning, D. L. 1995. A neural network approach for early detection of pro-gram modules having high risk in the maintenance phase. Journal of Systems and Software 29(1): 85–91.
Google Scholar
Khoshgoftaar, T. M., Munson, J. C., Bhattacharya, B. B., and Richardson, G. D., 1992. Predictive modeling techniques of software quality from software measures. IEEE Transactions on Software Engineering 18(11):979–987.
Google Scholar
Khoshgoftaar, T. M., and Seliya, N. 2002. Tree-based software quality models for fault prediction. In Proceedings: 8th International Software Metrics Symposium. Ottawa, Ontario, Canada, 203–214.
Kolodner, J. 1993. Case-Based Reasoning. San Mateo, California, USA: Morgan Kaufmann Publishers, Inc.
Google Scholar
Lin, C. T., and Lee, C. S. G. 1996. Neural Fuzzy Systems: A Neuro-Fuzzy Synergism to Intelligent Systems. Upper Saddle River, NJ, USA: Prentice Hall Inc.
Google Scholar
Lippmann, R. P. 1987. An introduction to computing withneural networks. Acoustics, Speech and Signal Processing Magazine 4(2): 4–22.
Google Scholar
Minsky, M., and Papert, S. 1969. Perceptrons. MA, USA: MIT Press.
Google Scholar
Neter, J., Kutner, M. H., Nachtsheim, C. J., and Wasserman, W. 1996. Applied Linear Statistical Models. Tom Casson.
Nielsen, R. H. 1987. Counter propagation network. Applied Optics Journal 26(23).
Ohlsson, M. C., and Runeson, P. 2002. Experience from replicating empirical studies on prediction models. In Proceedings: 8th International Software Metrics Symposium. Ottawa, Ontario, Canada, 217–226.
Rosenblatt, F. 1962. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. New York, NY, USA: Spartan Books.
Google Scholar
Rumelhart, D. E., Hinton, G. E., and Williams, R. 1962. Parallel Distributed Processing, Vol. 1. Cambridge, MA, USA: MIT Press.
Google Scholar
Schneidewind, N. F. 1995. Software metrics validation: space shuttle flight software example. Annals of Software Engineering 1: 287–309.
Google Scholar
Schneidewind, N. F. 1997. Software metrics model for integrating quality control and prediction. In Proceedings: 8th International Symposium on Software Reliability Engineering. Albuquerque, NM USA, 402–415.
Seliya, N. 2001. Software fault prediction using tree-based models. Master's thesis, Florida Atlantic University, Boca Raton, FL USA. Advised by T. M. Khoshgoftaar.
Google Scholar
Sundaresh, N. 2001. An empirical study of analogy based software fault prediction. Master's thesis, Florida Atlantic University, Boca Raton, FL USA. Advised by Taghi M. Khoshgoftaar.
Google Scholar
Takahashi, R., Muraoka, Y., and Nakamura, Y. 1997. Building software quality classification trees: approach, experimentation, evaluation. In Proceedings: 8th International Symposium on Software Reliability Engineering. Albuquerque, NM, USA, 222–233.
Troster, J., and J. Tian 1995. Measurement and defect modeling for a legacy software system. Annals of Software Engineering 1: 95–118.
Google Scholar
Xu, Z. 2001. Fuzzy logic techniques for software reliability engineering. Ph.D. thesis, Florida Atlantic University, Boca Raton, Florida USA. Advised by Taghi M. Khoshgoftaar.
Google Scholar

Download references

Author information

Authors and Affiliations

Florida Atlantic University, Boca Raton, Florida
Taghi M. Khoshgoftaar
Florida Atlantic University, Boca Raton, Florida
Naeem Seliya

Authors

Taghi M. Khoshgoftaar
View author publications
You can also search for this author in PubMed Google Scholar
Naeem Seliya
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khoshgoftaar, T.M., Seliya, N. Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques. Empirical Software Engineering 8, 255–283 (2003). https://doi.org/10.1023/A:1024424811345

Download citation

Issue Date: September 2003
DOI: https://doi.org/10.1023/A:1024424811345

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques

Abstract

Access this article

Similar content being viewed by others

Linear and non-linear bayesian regression methods for software fault prediction

A Machine Learning Approach to Predict Software Faults

Software Defect-Based Prediction Using Logistic Regression: Review and Challenges

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques

Abstract

Access this article

Similar content being viewed by others

Linear and non-linear bayesian regression methods for software fault prediction

A Machine Learning Approach to Predict Software Faults

Software Defect-Based Prediction Using Logistic Regression: Review and Challenges

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation