Skip to main content
Log in

Analogy-Based Practical Classification Rules for Software Quality Estimation

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Software metrics-based quality estimation models can be effective tools for identifying which modules are likely to be fault-prone or not fault-prone. The use of such models prior to system deployment can considerably reduce the likelihood of faults discovered during operations, hence improving system reliability. A software quality classification model is calibrated using metrics from a past release or similar project, and is then applied to modules currently under development. Subsequently, a timely prediction of which modules are likely to have faults can be obtained. However, software quality classification models used in practice may not provide a useful balance between the two misclassification rates, especially when there are very few faulty modules in the system being modeled.

This paper presents, in the context of case-based reasoning, two practical classification rules that allow appropriate emphasis on each type of misclassification as per the project requirements. The suggested techniques are especially useful for high-assurance systems where faulty modules are rare. The proposed generalized classification methods emphasize on the costs of misclassifications, and the unbalanced distribution of the faulty program modules. We illustrate the proposed techniques with a case study that consists of software measurements and fault data collected over multiple releases of a large-scale legacy telecommunication system. In addition to investigating the two classification methods, a brief relative comparison of the techniques is also presented. It is indicated that the level of classification accuracy and model-robustness observed for the case study would be beneficial in achieving high software reliability of its subsequent system releases. Similar observations are made from our empirical studies with other case studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Basili, V. R., Briand, L. C., and Melo, W. L. 1996. A validation of object-oriented design metrics as quality indicators. IEEE Trans on Soft Eng 22(10): 751-761.

    Google Scholar 

  • Berenson, M. L., Levine, D. M., and Goldstein, M. 1983. Intermediate Statistical Methods and Applications: A Computer Package Approach. Englewood Cliffs, NJ, USA: Prentice Hall.

    Google Scholar 

  • Briand, L. C., Basili, V. R., and Hetmanski, C. J. 1993. Developing interpretable models withoptimized set reduction for identifying high-risk software components. IEEE Transactions on Software Engineering 19(11): 1028-1044.

    Google Scholar 

  • Briand, L. C., Melo, W. L., and Wust, J. 2002. Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Transactions on Software Engineering 28(7): 706-720.

    Google Scholar 

  • Dillon, W. R., and Goldstein, M. 1984. Multivariate Analysis: Methods and Applications. New York: John Wiley & Sons.

    Google Scholar 

  • Ebert, C. 1996. Classification techniques for metric-based software development. Software Quality Journal 5(4): 255-272.

    Google Scholar 

  • Fayyad, U. M. 1996. Data mining and knowledge discovery: Making sense out of data. IEEE Expert 11(4): 20-25.

    Google Scholar 

  • Ganesan, K., Khoshgoftaar, T. M., and Allen, E. B. 2000. Case-based software quality prediction. International Journal of Software Engineering and Knowledge Engineering 10(2): 139-152, April 2000. World Scientific Publishing.

    Google Scholar 

  • Gokhale, S. S., and Lyu, M. R. 1997. Regression tree modeling for the prediction of software quality. In H. Pham (ed.), Proceedings: 3rd International Conference on Reliability and Quality in Design. Anaheim, California, USA, International Society of Science and Applied Technologies, 31-36.

    Google Scholar 

  • Gray, A. R., and MacDonell, S. G. 1999. Software metrics data analysis: Exploring the relative performance of some commonly used modeling techniques. Empirical Software Engineering 4: 297-316.

    Google Scholar 

  • Hudepohl, J. P., Aud, S. J., Khoshgoftaar, T. M., Allen, E. B., and Mayrand, J. EMERALD: Software metrics and models on the desktop. IEEE Software 13 (5): 56-60.

  • Khoshgoftaar, T. M., and Allen, E. B. 1999. Logistic regression modeling of software quality. International Journal of Reliability, Quality and Safety Engineering 6(4): 303-317.

    Google Scholar 

  • Khoshgoftaar, T. M., and Allen, E. B. 2000. A practical classification rule for software quality models. IEEE Transactions on Reliability 49(2): 209-216.

    Google Scholar 

  • Khoshgoftaar, T. M., and Allen, E. B. 2001. Modeling software quality with classification trees. In H. Pham (ed.), Recent Advances in Reliability and Quality Engineering, chapter 15, Singapore: World Scientific Publishing, 247-270.

    Google Scholar 

  • Khoshgoftaar, T. M., Allen, E. B., and Busboom, J. C. 2000. Modeling software quality: The software measurement analysis and reliability toolkit. In Proceedings: 12th International Conference on Tools with Artificial Intelligence. Vancouver, BC, Canada: IEEE Computer Society, 54-61.

    Google Scholar 

  • Khoshgoftaar, T. M., Allen, E. B., and Shan, R. 2000. Improving tree-based models of software quality withprincipal components analysis. In Proceedings or the Eleventh International Symposium on Software Reliability Engineering. San Jose, California USA: IEEE Computer Society, 198-209.

    Google Scholar 

  • Khoshgoftaar, T. M., Ganesan, K., Allen, E. B., Ross, F. D., Munikoti, R., Goel, N., and Nandi, A., 1997. Predicting fault-prone modules with case-based reasoning. In Proceedings: 8th International Symposium on Software Reliability Engineering. Albuquerque, NM, USA: IEEE Computer Society, 27-35.

    Google Scholar 

  • Khoshgoftaar, T. M., and Lanning, D. L. 1995. A neural network approach for early detection of program modules having high risk in the maintenance phase. Journal of Systems and Software 29(1): 85-91.

    Google Scholar 

  • Khoshgoftaar, T. M., Lim, L., and Geleyn, E. 2001. Developing accurate software quality models using a faster, easier, and cheaper method. In Proceedings: 7th International Conference on Reliability and Quality in Design. Washington D.C., USA: International Society of Science and Applied Technologies, 31-35.

    Google Scholar 

  • Khoshgoftaar, T. M., Yuan, X., and Allen, E. B. 2000. Balancing misclassification rates in classification tree models of software quality. Empirical Software Engineering 5: 313-330.

    Google Scholar 

  • Kolodner, J. 1993. Case-Based Reasoning. San Mateo, California USA: Morgan Kaufmann Publishers, Inc.

    Google Scholar 

  • Michalski, R. S., Bratko, I., and Kubat, M. 1998. Machine Learning and Data Mining: Methods and Applications. John Wiley and Sons.

  • Ohlsson, N., Eriksson, A. C., and Helander, M. 1997. Early risk-management by identification of faultprone modules. Empirical Software Engineering 2(2): 166-173.

    Google Scholar 

  • Ohlsson, N., Zhao, M., and Helander, M. 1998. Application of multivariate analysis for software fault prediction. Software Quality Journal 7(1): 51-66.

    Google Scholar 

  • Porter, A. A., Siy, H. P., Toman, C. A., and Votta, L. G. 1997. An experiment to assess the cost-benefits of code-inspection in large scale software development. IEEE Transactions on Software Engineering 23(6): 329-346.

    Google Scholar 

  • Ramamoorthy, C. V., Chandra, C., Ishihara, S., and Ng, Y. 1993. Knowledge-based tools for risk assessment in software development and reuse. In Proceedings: 5th International Conference on Tools with Artificial Intelligence. Boston, MA, USA: IEEE Computer Society, 364-371.

    Google Scholar 

  • Ross, F. D. 2001. An empirical study of analogy based software quality classification models. Master's thesis, Florida Atlantic University, Boca Raton, FL, USA, August 2001. Advised by T. M. Khoshgoftaar.

    Google Scholar 

  • Runeson, P., Ohlsson, M. C., and Wohlin, C. 2001. A classification scheme for studies on fault-prone components. Lecture Notes in Computer Science 2188: 341-355.

    Google Scholar 

  • Schneidewind, N. F. 1995. Software metrics validation: Space shuttle flight software example. Annals of Software Engineering 1: 287-309.

    Google Scholar 

  • Schneidewind, N. F. 1997. Software metrics model for integrating quality control and prediction. In Proceedings: 8th International Symposium on Software Reliability Engineering. Albuquerque, NM USA: IEEE Computer Society, 402-415.

    Google Scholar 

  • Schneidewind, N. F. 2001. Investigation of logistic regression as a discriminant of software quality. In Proceedings: 7th International Software Metrics Symposium. London, UK: IEEE Computer Society, 328-337.

    Google Scholar 

  • Shepperd, M., and Kadoda, G. 2001. Comparing software prediction techniques using simulation. IEEE Transactions on Software Engineering 27(11): 1014-1022.

    Google Scholar 

  • Shepperd, M., and Schofield, C. 1997. Estimating software project effort using analogies. IEEE Transactions on Software Engineering 23(12): 736-743.

    Google Scholar 

  • Wang, S., and Kountanis, D. 1992. IASCE: An intelligent assistant to software cost estimation. In Proceedings: 5th International Conference on Tools with Artificial Intelligence. Arlington, VA, USA: IEEE Computer Society, 114-176.

    Google Scholar 

  • Xu, Z. 2001. Fuzzy Logic Techniques for Software Reliability Engineering. PhD thesis, Florida Atlantic University, Boca Raton, Florida, USA, May 2001. Advised by Taghi M. Khoshgoftaar.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khoshgoftaar, T.M., Seliya, N. Analogy-Based Practical Classification Rules for Software Quality Estimation. Empirical Software Engineering 8, 325–350 (2003). https://doi.org/10.1023/A:1025316301168

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025316301168

Navigation