Abstract
Software quality models can give timely predictions of reliability indicators, for targeting software improvement efforts. In some cases, classification techniques are sufficient for useful software quality models.
The software engineering community has not applied informed prior probabilities widely to software quality classification modeling studies. Moreover, even though costs are of paramount concern to software managers, costs of misclassification have received little attention in the software engineering literature. This paper applies informed prior probabilities and costs of misclassification to software quality classification. We also discuss the advantages and limitations of several statistical methods for evaluating the accuracy of software quality classification models.
We conducted two full-scale industrial case studies which integrated these concepts with nonparametric discriminant analysis to illustrate how they can be used by a classification technique. The case studies supported our hypothesis that classification models of software quality can benefit by considering informed prior probabilities and by minimizing the expected cost of misclassifications. The case studies also illustrated the advantages and limitations of resubstitution, cross-validation, and data splitting for model evaluation.
Similar content being viewed by others
References
Basili, V. R., Briand, L. C., and Melo, W. 1996. A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering 22(10): 751–761.
Boehm, B. W. 1988. A spiral model of software development and enhancement. Computer 21(5): 61–72.
Briand, L. C., Basili, V. R., and Hetmanski, C. J. 1993. Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Transactions on Software Engineering 19(11): 1028–1044.
Dillon, W. R. and Goldstein, M. 1984. Multivariate Analysis: Methods and Applications. John Wiley & Sons, New York.
Ebert, C. 1996. Classification techniques for metric-based software development. Software Quality Journal 5(4): 255–272.
Efron, B. 1983. Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of the American Statistical Association 78(382): 316–331.
Geisser, S. 1975. The predictive sample reuse method with applications. Journal of the American Statistical Association 70(350): 320–328.
Gokhale, S. S. and Lyu, M. R. 1997. Regression tree modeling for the prediction of software quality. In Pham, H., ed., Proceedings of the Third ISSAT International Conference on Reliability and Quality in Design, Anaheim, CA, 31–36. International Society of Science and Applied Technologies.
Johnson, R. A. and Wichern, D. W. 1992. Applied Multivariate Statistical Analysis. Prentice Hall, Englewood Cliffs, NJ, 3d edition.
Khoshgoftaar, T. M. and Allen, E. B. 1995. Multivariate assessment of complex software systems: A comparative study. In Proceedings of the First International Conference on Engineering of Complex Computer Systems, Fort Lauderdale, FL, 389–396. IEEE Computer Society.
Khoshgoftaar, T. M., and Allen, E. B. 1997a. Classification techniques for predicting software quality: Lessons learned. In Proceedings of the Annual Oregon Workshop on Software Metrics, Coeur d'Alene, ID, USA. University of Idaho.
Khoshgoftaar, T. M., and Allen, E. B. 1997b. The impact of costs of misclassification on software quality modeling. In Proceedings of the Fourth International Software Metrics Symposium, Albuquerque, NM, USA, 54–62. IEEE Computer Society.
Khoshgoftaar, T. M., Allen, E. B., Bullard, L. A., Halstead, R., and Trio, G. P. 1996a. A tree-based classification model for analysis of a military software system. In Proceedings of the IEEE High-Assurance Systems Engineering Workshop, Niagara on the Lake, Ontario, Canada, 244–251. IEEE Computer Society.
Khoshgoftaar, T. M., Allen, E. B., Halstead, R., Trio, G. P., and Flass, R. 1998. Process measures for predicting software quality. Computer 31(4): 66–72.
Khoshgoftaar, T. M., Allen, E. B., Kalaichelvan, K. S., and Goel, N. 1996b. Early quality prediction: A case study in telecommunications. IEEE Software 13(1): 65–71.
Khoshgoftaar, T. M., Allen, E. B., Kalaichelvan, K. S., and Goel, N. 1996c. The impact of software evolution and reuse on software quality. Empirical Software Engineering: An International Journal 1(1): 31–44.
Khoshgoftaar, T. M., and Lanning, D. L. 1995. A neural network approach for early detection of program modules having high risk in the maintenance phase. Journal of Systems and Software 29(1): 85–91.
Khoshgoftaar, T. M., Lanning, D. L., and Pandya, A. S. 1994. A comparative study of pattern recognition techniques for quality evaluation of telecommunications software. IEEE Journal on Selected Areas in Communications 12(2): 279–291.
Lachenbruch, P. A. and Mickey, M. R. 1968. Estimation of error rates in discriminant analysis. Technometrics 10(1): 1–11.
Munson, J. C. and Khoshgoftaar, T. M. 1992. The detection of fault-prone programs. IEEE Transactions on Software Engineering 18(5): 423–433.
Schneidewind, N. F. 1992. Methodology for validating software metrics. IEEE Transactions on Software Engineering 18(5): 410–422.
Schneidewind, N. F. 1995. Software metrics validation: Space Shuttle flight software example. Annals of Software Engineering 1: 287–309.
Seber, G. A. F. 1984. Multivariate Observations. John Wiley and Sons, New York.
Selby, R. W. 1990. Empirically based analysis of failures in software systems. IEEE Transactions on Reliability 39(4): 444–454.
Selby, R. W. and Porter, A. A. 1988. Learning from examples: Generation and evaluation of decision trees for software resource analysis. IEEE Transactions on Software Engineering 14(12): 1743–1756.
Szabo, R. M. and Khoshgoftaar, T. M. 1995. An assessment of software quality in a C++ environment. In Proceedings of the Sixth International Symposium on Software Reliability Engineering, Toulouse, France, 240–249. IEEE Computer Society.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Khoshgoftaar, T.M., Allen, E.B. Classification of Fault-Prone Software Modules: Prior Probabilities, Costs, and Model Evaluation. Empirical Software Engineering 3, 275–298 (1998). https://doi.org/10.1023/A:1009736205722
Issue Date:
DOI: https://doi.org/10.1023/A:1009736205722