Abstract
High-assurance and complex mission-critical software systems are heavily dependent on reliability of their underlying software applications. An early software fault prediction is a proven technique in achieving high software reliability. Prediction models based on software metrics can predict number of faults in software modules. Timely predictions of such models can be used to direct cost-effective quality enhancement efforts to modules that are likely to have a high number of faults. We evaluate the predictive performance of six commonly used fault prediction techniques: CART-LS (least squares), CART-LAD (least absolute deviation), S-PLUS, multiple linear regression, artificial neural networks, and case-based reasoning. The case study consists of software metrics collected over four releases of a very large telecommunications system. Performance metrics, average absolute and average relative errors, are utilized to gauge the accuracy of different prediction models. Models were built using both, original software metrics (RAW) and their principle components (PCA). Two-way ANOVA randomized-complete block design models with two blocking variables are designed with average absolute and average relative errors as response variables. System release and the model type (RAW or PCA) form the blocking variables and the prediction technique is treated as a factor. Using multiple-pairwise comparisons, the performance order of prediction models is determined. We observe that for both average absolute and average relative errors, the CART-LAD model performs the best while the S-PLUS model is ranked sixth.
Similar content being viewed by others
References
Beaumont, G. P. 1996. Statistical Tests: An Introduction with Minitab Commentary. Prentice Hall.
Berenson, M. L., Levine, D. M., and Goldstein, M. 1983. Intermediate Statistical Methods and Applications: A Computer Package Approach. Englewood Cliffs, NJ, USA: Prentice Hall.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. 1984. Classification And Regression Trees. Belmont, California, USA: Wadsworth International Group, 2nd edition.
Briand, L. C., Basili, V. R., and Hetmanski, C. J. 1993. Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Transactions on Software Engineering 19(11): 1028–1044.
Briand, L. C., Langley, T., and Wieczorek, I. 2000. A replicated assessment and comparison of common software cost modeling techniques. In Proceedings: International Conference on Software Engineering. Limerick, Ireland, 377–386.
Clark, L. A., and Pregibon, D. 1992. Tree-based models. In J. M. Chambers and T. J. Hastie (eds.): Statistical Models in S. Pacific Grove, California: Wadsworth International Group, pp. 377–419.
Fenton, N. E., and Pfleeger, S. L. 1997. Software Metrics: A Rigorous and Practical Approach, second edition, Boston, MA, USA: PWS Publishing Company: ITP.
Finnie, G. R., Wittig, G. E., and Desharnais, J. M. 1997. A comparison of software effort estimation techniques: using function points with neural networks, case-based reasoning, and regression models. Journal of Systems and Software 39: 281–289.
Ganesan, K., Khoshgoftaar, T. M., and Allen, E. B. 2000. Case-based software quality prediction. International Journal of Software Engineering and Knowledge Engineering 10(2): 139–152. World Scientific Publishing.
Gokhale, S. S., and Lyu, M. R. 1997. Regression tree modeling for the prediction of software quality. In H. Pham (ed.): Proceedings: 3rd International Conference on Reliability and Quality in Design. Anaheim, California, USA, 31–36.
Gray, A. R., and MacDonell, S. G. 1999. Software metrics data analysis: exploring the relative performance of some commonly used modeling techniques. Empirical Software Engineering 4: 297–316.
Hudepohl, J. P., Aud, S. J., Khoshgoftaar, T. M., Allen, E. B., and Mayrand, J. 1996. Emerald: Software metrics and models on the desktop. IEEE Software 13(5): 56–60.
Jones, W. D., Hudepohl, J. P., Khoshgoftaar, T. M., and Allen, E. B. 1999. Application of a usage profile in software quality models. In Proceedings: 3rd European Conference on Software Maintenance and Reengineering. Amsterdam, Netherlands, 148–157.
Khoshgoftaar, T. M., and Allen, E. B. 2001. Modeling software quality with classification trees. In H. Pham (ed.): Recent Advances in Reliability and Quality Engineering. Singapore: World Scientific Publishing, Chapter 15, 247–270.
Khoshgoftaar, T. M., Allen, E. B., and Busboom, J. C. 2000a. Modeling software quality: the software measurement analysis and reliability toolkit. In Proceedings: 12th International Conference on Tools with Artificial Intelligence. Vancouver, BC Canada, 54–61.
Khoshgoftaar, T. M., Allen, E. B., and Deng, J. 2001. Controlling overfitting in software quality models: experiments with regression trees and classification. In Proceedings: 7th International Software Metrics Symposium. London UK, 190–198.
Khoshgoftaar, T. M., Allen, E. B., Jones, W. D., and Hudepohl, J. P. 2000b. Accuracy of software quality models over multiple releases. Annals of Software Engineering 9(1–4): 103–116.
Khoshgoftaar, T. M., Allen, E. B., and Shan, R. 2000c. Improving tree-based models of software quality with principal components analysis. In Proceedings of the Eleventh International Symposium on Software Reliability Engineering. San Jose, California, USA, 198–209.
Khoshgoftaar, T. M., and Lanning, D. L. 1995. A neural network approach for early detection of pro-gram modules having high risk in the maintenance phase. Journal of Systems and Software 29(1): 85–91.
Khoshgoftaar, T. M., Munson, J. C., Bhattacharya, B. B., and Richardson, G. D., 1992. Predictive modeling techniques of software quality from software measures. IEEE Transactions on Software Engineering 18(11):979–987.
Khoshgoftaar, T. M., and Seliya, N. 2002. Tree-based software quality models for fault prediction. In Proceedings: 8th International Software Metrics Symposium. Ottawa, Ontario, Canada, 203–214.
Kolodner, J. 1993. Case-Based Reasoning. San Mateo, California, USA: Morgan Kaufmann Publishers, Inc.
Lin, C. T., and Lee, C. S. G. 1996. Neural Fuzzy Systems: A Neuro-Fuzzy Synergism to Intelligent Systems. Upper Saddle River, NJ, USA: Prentice Hall Inc.
Lippmann, R. P. 1987. An introduction to computing withneural networks. Acoustics, Speech and Signal Processing Magazine 4(2): 4–22.
Minsky, M., and Papert, S. 1969. Perceptrons. MA, USA: MIT Press.
Neter, J., Kutner, M. H., Nachtsheim, C. J., and Wasserman, W. 1996. Applied Linear Statistical Models. Tom Casson.
Nielsen, R. H. 1987. Counter propagation network. Applied Optics Journal 26(23).
Ohlsson, M. C., and Runeson, P. 2002. Experience from replicating empirical studies on prediction models. In Proceedings: 8th International Software Metrics Symposium. Ottawa, Ontario, Canada, 217–226.
Rosenblatt, F. 1962. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. New York, NY, USA: Spartan Books.
Rumelhart, D. E., Hinton, G. E., and Williams, R. 1962. Parallel Distributed Processing, Vol. 1. Cambridge, MA, USA: MIT Press.
Schneidewind, N. F. 1995. Software metrics validation: space shuttle flight software example. Annals of Software Engineering 1: 287–309.
Schneidewind, N. F. 1997. Software metrics model for integrating quality control and prediction. In Proceedings: 8th International Symposium on Software Reliability Engineering. Albuquerque, NM USA, 402–415.
Seliya, N. 2001. Software fault prediction using tree-based models. Master's thesis, Florida Atlantic University, Boca Raton, FL USA. Advised by T. M. Khoshgoftaar.
Sundaresh, N. 2001. An empirical study of analogy based software fault prediction. Master's thesis, Florida Atlantic University, Boca Raton, FL USA. Advised by Taghi M. Khoshgoftaar.
Takahashi, R., Muraoka, Y., and Nakamura, Y. 1997. Building software quality classification trees: approach, experimentation, evaluation. In Proceedings: 8th International Symposium on Software Reliability Engineering. Albuquerque, NM, USA, 222–233.
Troster, J., and J. Tian 1995. Measurement and defect modeling for a legacy software system. Annals of Software Engineering 1: 95–118.
Xu, Z. 2001. Fuzzy logic techniques for software reliability engineering. Ph.D. thesis, Florida Atlantic University, Boca Raton, Florida USA. Advised by Taghi M. Khoshgoftaar.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Khoshgoftaar, T.M., Seliya, N. Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques. Empirical Software Engineering 8, 255–283 (2003). https://doi.org/10.1023/A:1024424811345
Issue Date:
DOI: https://doi.org/10.1023/A:1024424811345