Analogy-Based Practical Classification Rules for Software Quality Estimation

Khoshgoftaar, Taghi M.; Seliya, Naeem

doi:10.1023/A:1025316301168

Analogy-Based Practical Classification Rules for Software Quality Estimation

Published: December 2003

Volume 8, pages 325–350, (2003)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Taghi M. Khoshgoftaar¹ &
Naeem Seliya²

216 Accesses
67 Citations
Explore all metrics

Abstract

Software metrics-based quality estimation models can be effective tools for identifying which modules are likely to be fault-prone or not fault-prone. The use of such models prior to system deployment can considerably reduce the likelihood of faults discovered during operations, hence improving system reliability. A software quality classification model is calibrated using metrics from a past release or similar project, and is then applied to modules currently under development. Subsequently, a timely prediction of which modules are likely to have faults can be obtained. However, software quality classification models used in practice may not provide a useful balance between the two misclassification rates, especially when there are very few faulty modules in the system being modeled.

This paper presents, in the context of case-based reasoning, two practical classification rules that allow appropriate emphasis on each type of misclassification as per the project requirements. The suggested techniques are especially useful for high-assurance systems where faulty modules are rare. The proposed generalized classification methods emphasize on the costs of misclassifications, and the unbalanced distribution of the faulty program modules. We illustrate the proposed techniques with a case study that consists of software measurements and fault data collected over multiple releases of a large-scale legacy telecommunication system. In addition to investigating the two classification methods, a brief relative comparison of the techniques is also presented. It is indicated that the level of classification accuracy and model-robustness observed for the case study would be beneficial in achieving high software reliability of its subsequent system releases. Similar observations are made from our empirical studies with other case studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on fault detection and diagnosis techniques: basics and beyond

Article 10 November 2020

Data collection and quality challenges in deep learning: a data-centric AI perspective

Article 03 January 2023

Sampling in software engineering research: a critical review and guidelines

Article 28 April 2022

References

Basili, V. R., Briand, L. C., and Melo, W. L. 1996. A validation of object-oriented design metrics as quality indicators. IEEE Trans on Soft Eng 22(10): 751-761.
Google Scholar
Berenson, M. L., Levine, D. M., and Goldstein, M. 1983. Intermediate Statistical Methods and Applications: A Computer Package Approach. Englewood Cliffs, NJ, USA: Prentice Hall.
Google Scholar
Briand, L. C., Basili, V. R., and Hetmanski, C. J. 1993. Developing interpretable models withoptimized set reduction for identifying high-risk software components. IEEE Transactions on Software Engineering 19(11): 1028-1044.
Google Scholar
Briand, L. C., Melo, W. L., and Wust, J. 2002. Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Transactions on Software Engineering 28(7): 706-720.
Google Scholar
Dillon, W. R., and Goldstein, M. 1984. Multivariate Analysis: Methods and Applications. New York: John Wiley & Sons.
Google Scholar
Ebert, C. 1996. Classification techniques for metric-based software development. Software Quality Journal 5(4): 255-272.
Google Scholar
Fayyad, U. M. 1996. Data mining and knowledge discovery: Making sense out of data. IEEE Expert 11(4): 20-25.
Google Scholar
Ganesan, K., Khoshgoftaar, T. M., and Allen, E. B. 2000. Case-based software quality prediction. International Journal of Software Engineering and Knowledge Engineering 10(2): 139-152, April 2000. World Scientific Publishing.
Google Scholar
Gokhale, S. S., and Lyu, M. R. 1997. Regression tree modeling for the prediction of software quality. In H. Pham (ed.), Proceedings: 3rd International Conference on Reliability and Quality in Design. Anaheim, California, USA, International Society of Science and Applied Technologies, 31-36.
Google Scholar
Gray, A. R., and MacDonell, S. G. 1999. Software metrics data analysis: Exploring the relative performance of some commonly used modeling techniques. Empirical Software Engineering 4: 297-316.
Google Scholar
Hudepohl, J. P., Aud, S. J., Khoshgoftaar, T. M., Allen, E. B., and Mayrand, J. EMERALD: Software metrics and models on the desktop. IEEE Software 13 (5): 56-60.
Khoshgoftaar, T. M., and Allen, E. B. 1999. Logistic regression modeling of software quality. International Journal of Reliability, Quality and Safety Engineering 6(4): 303-317.
Google Scholar
Khoshgoftaar, T. M., and Allen, E. B. 2000. A practical classification rule for software quality models. IEEE Transactions on Reliability 49(2): 209-216.
Google Scholar
Khoshgoftaar, T. M., and Allen, E. B. 2001. Modeling software quality with classification trees. In H. Pham (ed.), Recent Advances in Reliability and Quality Engineering, chapter 15, Singapore: World Scientific Publishing, 247-270.
Google Scholar
Khoshgoftaar, T. M., Allen, E. B., and Busboom, J. C. 2000. Modeling software quality: The software measurement analysis and reliability toolkit. In Proceedings: 12th International Conference on Tools with Artificial Intelligence. Vancouver, BC, Canada: IEEE Computer Society, 54-61.
Google Scholar
Khoshgoftaar, T. M., Allen, E. B., and Shan, R. 2000. Improving tree-based models of software quality withprincipal components analysis. In Proceedings or the Eleventh International Symposium on Software Reliability Engineering. San Jose, California USA: IEEE Computer Society, 198-209.
Google Scholar
Khoshgoftaar, T. M., Ganesan, K., Allen, E. B., Ross, F. D., Munikoti, R., Goel, N., and Nandi, A., 1997. Predicting fault-prone modules with case-based reasoning. In Proceedings: 8th International Symposium on Software Reliability Engineering. Albuquerque, NM, USA: IEEE Computer Society, 27-35.
Google Scholar
Khoshgoftaar, T. M., and Lanning, D. L. 1995. A neural network approach for early detection of program modules having high risk in the maintenance phase. Journal of Systems and Software 29(1): 85-91.
Google Scholar
Khoshgoftaar, T. M., Lim, L., and Geleyn, E. 2001. Developing accurate software quality models using a faster, easier, and cheaper method. In Proceedings: 7th International Conference on Reliability and Quality in Design. Washington D.C., USA: International Society of Science and Applied Technologies, 31-35.
Google Scholar
Khoshgoftaar, T. M., Yuan, X., and Allen, E. B. 2000. Balancing misclassification rates in classification tree models of software quality. Empirical Software Engineering 5: 313-330.
Google Scholar
Kolodner, J. 1993. Case-Based Reasoning. San Mateo, California USA: Morgan Kaufmann Publishers, Inc.
Google Scholar
Michalski, R. S., Bratko, I., and Kubat, M. 1998. Machine Learning and Data Mining: Methods and Applications. John Wiley and Sons.
Ohlsson, N., Eriksson, A. C., and Helander, M. 1997. Early risk-management by identification of faultprone modules. Empirical Software Engineering 2(2): 166-173.
Google Scholar
Ohlsson, N., Zhao, M., and Helander, M. 1998. Application of multivariate analysis for software fault prediction. Software Quality Journal 7(1): 51-66.
Google Scholar
Porter, A. A., Siy, H. P., Toman, C. A., and Votta, L. G. 1997. An experiment to assess the cost-benefits of code-inspection in large scale software development. IEEE Transactions on Software Engineering 23(6): 329-346.
Google Scholar
Ramamoorthy, C. V., Chandra, C., Ishihara, S., and Ng, Y. 1993. Knowledge-based tools for risk assessment in software development and reuse. In Proceedings: 5th International Conference on Tools with Artificial Intelligence. Boston, MA, USA: IEEE Computer Society, 364-371.
Google Scholar
Ross, F. D. 2001. An empirical study of analogy based software quality classification models. Master's thesis, Florida Atlantic University, Boca Raton, FL, USA, August 2001. Advised by T. M. Khoshgoftaar.
Google Scholar
Runeson, P., Ohlsson, M. C., and Wohlin, C. 2001. A classification scheme for studies on fault-prone components. Lecture Notes in Computer Science 2188: 341-355.
Google Scholar
Schneidewind, N. F. 1995. Software metrics validation: Space shuttle flight software example. Annals of Software Engineering 1: 287-309.
Google Scholar
Schneidewind, N. F. 1997. Software metrics model for integrating quality control and prediction. In Proceedings: 8th International Symposium on Software Reliability Engineering. Albuquerque, NM USA: IEEE Computer Society, 402-415.
Google Scholar
Schneidewind, N. F. 2001. Investigation of logistic regression as a discriminant of software quality. In Proceedings: 7th International Software Metrics Symposium. London, UK: IEEE Computer Society, 328-337.
Google Scholar
Shepperd, M., and Kadoda, G. 2001. Comparing software prediction techniques using simulation. IEEE Transactions on Software Engineering 27(11): 1014-1022.
Google Scholar
Shepperd, M., and Schofield, C. 1997. Estimating software project effort using analogies. IEEE Transactions on Software Engineering 23(12): 736-743.
Google Scholar
Wang, S., and Kountanis, D. 1992. IASCE: An intelligent assistant to software cost estimation. In Proceedings: 5th International Conference on Tools with Artificial Intelligence. Arlington, VA, USA: IEEE Computer Society, 114-176.
Google Scholar
Xu, Z. 2001. Fuzzy Logic Techniques for Software Reliability Engineering. PhD thesis, Florida Atlantic University, Boca Raton, Florida, USA, May 2001. Advised by Taghi M. Khoshgoftaar.
Google Scholar

Download references

Author information

Authors and Affiliations

Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL, 33431
Taghi M. Khoshgoftaar
Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL, 33431
Naeem Seliya

Authors

Taghi M. Khoshgoftaar
View author publications
You can also search for this author in PubMed Google Scholar
Naeem Seliya
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khoshgoftaar, T.M., Seliya, N. Analogy-Based Practical Classification Rules for Software Quality Estimation. Empirical Software Engineering 8, 325–350 (2003). https://doi.org/10.1023/A:1025316301168

Download citation

Issue Date: December 2003
DOI: https://doi.org/10.1023/A:1025316301168

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analogy-Based Practical Classification Rules for Software Quality Estimation

Abstract

Access this article

Similar content being viewed by others

A review on fault detection and diagnosis techniques: basics and beyond

Data collection and quality challenges in deep learning: a data-centric AI perspective

Sampling in software engineering research: a critical review and guidelines

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Analogy-Based Practical Classification Rules for Software Quality Estimation

Abstract

Access this article

Similar content being viewed by others

A review on fault detection and diagnosis techniques: basics and beyond

Data collection and quality challenges in deep learning: a data-centric AI perspective

Sampling in software engineering research: a critical review and guidelines

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation