Classification of Fault-Prone Software Modules: Prior Probabilities, Costs, and Model Evaluation

Khoshgoftaar, Taghi M.; Allen, Edward B.

doi:10.1023/A:1009736205722

Classification of Fault-Prone Software Modules: Prior Probabilities, Costs, and Model Evaluation

Published: September 1998

Volume 3, pages 275–298, (1998)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Taghi M. Khoshgoftaar¹ &
Edward B. Allen¹

148 Accesses
47 Citations
Explore all metrics

Abstract

Software quality models can give timely predictions of reliability indicators, for targeting software improvement efforts. In some cases, classification techniques are sufficient for useful software quality models.

The software engineering community has not applied informed prior probabilities widely to software quality classification modeling studies. Moreover, even though costs are of paramount concern to software managers, costs of misclassification have received little attention in the software engineering literature. This paper applies informed prior probabilities and costs of misclassification to software quality classification. We also discuss the advantages and limitations of several statistical methods for evaluating the accuracy of software quality classification models.

We conducted two full-scale industrial case studies which integrated these concepts with nonparametric discriminant analysis to illustrate how they can be used by a classification technique. The case studies supported our hypothesis that classification models of software quality can benefit by considering informed prior probabilities and by minimizing the expected cost of misclassifications. The case studies also illustrated the advantages and limitations of resubstitution, cross-validation, and data splitting for model evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A study on software fault prediction techniques

Article 30 May 2017

Santosh S. Rathore & Sandeep Kumar

Identifying and eliminating less complex instances from software fault data

Article 24 December 2016

Raed Shatnawi

Software Defect-Based Prediction Using Logistic Regression: Review and Challenges

References

Basili, V. R., Briand, L. C., and Melo, W. 1996. A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering 22(10): 751–761.
Google Scholar
Boehm, B. W. 1988. A spiral model of software development and enhancement. Computer 21(5): 61–72.
Google Scholar
Briand, L. C., Basili, V. R., and Hetmanski, C. J. 1993. Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Transactions on Software Engineering 19(11): 1028–1044.
Google Scholar
Dillon, W. R. and Goldstein, M. 1984. Multivariate Analysis: Methods and Applications. John Wiley & Sons, New York.
Google Scholar
Ebert, C. 1996. Classification techniques for metric-based software development. Software Quality Journal 5(4): 255–272.
Google Scholar
Efron, B. 1983. Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of the American Statistical Association 78(382): 316–331.
Google Scholar
Geisser, S. 1975. The predictive sample reuse method with applications. Journal of the American Statistical Association 70(350): 320–328.
Google Scholar
Gokhale, S. S. and Lyu, M. R. 1997. Regression tree modeling for the prediction of software quality. In Pham, H., ed., Proceedings of the Third ISSAT International Conference on Reliability and Quality in Design, Anaheim, CA, 31–36. International Society of Science and Applied Technologies.
Johnson, R. A. and Wichern, D. W. 1992. Applied Multivariate Statistical Analysis. Prentice Hall, Englewood Cliffs, NJ, 3d edition.
Google Scholar
Khoshgoftaar, T. M. and Allen, E. B. 1995. Multivariate assessment of complex software systems: A comparative study. In Proceedings of the First International Conference on Engineering of Complex Computer Systems, Fort Lauderdale, FL, 389–396. IEEE Computer Society.
Khoshgoftaar, T. M., and Allen, E. B. 1997a. Classification techniques for predicting software quality: Lessons learned. In Proceedings of the Annual Oregon Workshop on Software Metrics, Coeur d'Alene, ID, USA. University of Idaho.
Google Scholar
Khoshgoftaar, T. M., and Allen, E. B. 1997b. The impact of costs of misclassification on software quality modeling. In Proceedings of the Fourth International Software Metrics Symposium, Albuquerque, NM, USA, 54–62. IEEE Computer Society.
Khoshgoftaar, T. M., Allen, E. B., Bullard, L. A., Halstead, R., and Trio, G. P. 1996a. A tree-based classification model for analysis of a military software system. In Proceedings of the IEEE High-Assurance Systems Engineering Workshop, Niagara on the Lake, Ontario, Canada, 244–251. IEEE Computer Society.
Khoshgoftaar, T. M., Allen, E. B., Halstead, R., Trio, G. P., and Flass, R. 1998. Process measures for predicting software quality. Computer 31(4): 66–72.
Google Scholar
Khoshgoftaar, T. M., Allen, E. B., Kalaichelvan, K. S., and Goel, N. 1996b. Early quality prediction: A case study in telecommunications. IEEE Software 13(1): 65–71.
Google Scholar
Khoshgoftaar, T. M., Allen, E. B., Kalaichelvan, K. S., and Goel, N. 1996c. The impact of software evolution and reuse on software quality. Empirical Software Engineering: An International Journal 1(1): 31–44.
Google Scholar
Khoshgoftaar, T. M., and Lanning, D. L. 1995. A neural network approach for early detection of program modules having high risk in the maintenance phase. Journal of Systems and Software 29(1): 85–91.
Google Scholar
Khoshgoftaar, T. M., Lanning, D. L., and Pandya, A. S. 1994. A comparative study of pattern recognition techniques for quality evaluation of telecommunications software. IEEE Journal on Selected Areas in Communications 12(2): 279–291.
Google Scholar
Lachenbruch, P. A. and Mickey, M. R. 1968. Estimation of error rates in discriminant analysis. Technometrics 10(1): 1–11.
Google Scholar
Munson, J. C. and Khoshgoftaar, T. M. 1992. The detection of fault-prone programs. IEEE Transactions on Software Engineering 18(5): 423–433.
Google Scholar
Schneidewind, N. F. 1992. Methodology for validating software metrics. IEEE Transactions on Software Engineering 18(5): 410–422.
Google Scholar
Schneidewind, N. F. 1995. Software metrics validation: Space Shuttle flight software example. Annals of Software Engineering 1: 287–309.
Google Scholar
Seber, G. A. F. 1984. Multivariate Observations. John Wiley and Sons, New York.
Google Scholar
Selby, R. W. 1990. Empirically based analysis of failures in software systems. IEEE Transactions on Reliability 39(4): 444–454.
Google Scholar
Selby, R. W. and Porter, A. A. 1988. Learning from examples: Generation and evaluation of decision trees for software resource analysis. IEEE Transactions on Software Engineering 14(12): 1743–1756.
Google Scholar
Szabo, R. M. and Khoshgoftaar, T. M. 1995. An assessment of software quality in a C++ environment. In Proceedings of the Sixth International Symposium on Software Reliability Engineering, Toulouse, France, 240–249. IEEE Computer Society.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL, 33431, USA
Taghi M. Khoshgoftaar & Edward B. Allen

Authors

Taghi M. Khoshgoftaar
View author publications
You can also search for this author in PubMed Google Scholar
Edward B. Allen
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khoshgoftaar, T.M., Allen, E.B. Classification of Fault-Prone Software Modules: Prior Probabilities, Costs, and Model Evaluation. Empirical Software Engineering 3, 275–298 (1998). https://doi.org/10.1023/A:1009736205722

Download citation

Issue Date: September 1998
DOI: https://doi.org/10.1023/A:1009736205722

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification of Fault-Prone Software Modules: Prior Probabilities, Costs, and Model Evaluation

Abstract

Access this article

Similar content being viewed by others

A study on software fault prediction techniques

Identifying and eliminating less complex instances from software fault data

Software Defect-Based Prediction Using Logistic Regression: Review and Challenges

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Classification of Fault-Prone Software Modules: Prior Probabilities, Costs, and Model Evaluation

Abstract

Access this article

Similar content being viewed by others

A study on software fault prediction techniques

Identifying and eliminating less complex instances from software fault data

Software Defect-Based Prediction Using Logistic Regression: Review and Challenges

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation