Skip to main content
Log in

Applying the Mahalanobis-Taguchi strategy for software defect diagnosis

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

The Mahalanobis-Taguchi (MT) strategy combines mathematical and statistical concepts like Mahalanobis distance, Gram-Schmidt orthogonalization and experimental designs to support diagnosis and decision-making based on multivariate data. The primary purpose is to develop a scale to measure the degree of abnormality of cases, compared to “normal” or “healthy” cases, i.e. a continuous scale from a set of binary classified cases. An optimal subset of variables for measuring abnormality is then selected and rules for future diagnosis are defined based on them and the measurement scale. This maps well to problems in software defect prediction based on a multivariate set of software metrics and attributes. In this paper, the MT strategy combined with a cluster analysis technique for determining the most appropriate training set, is described and applied to well-known datasets in order to evaluate the fault-proneness of software modules. The measurement scale resulting from the MT strategy is evaluated using ROC curves and shows that it is a promising technique for software defect diagnosis. It compares favorably to previously evaluated methods on a number of publically available data sets. The special characteristic of the MT strategy that it quantifies the level of abnormality can also stimulate and inform discussions with engineers and managers in different defect prediction situations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

MT:

Mahalanobis-Taguchi;

MD:

Mahalanobis distance;

MTS:

Mahalanobis-Taguchi system;

MTGS:

Mahalanobis-Taguchi Gram-Schmidt process;

ROC:

Receiver Operating Characteristic;

AUC:

Area under the curve

References

  • Afzal, W., Torkar, R., Feldt, R., Gorschek, T.: Genetic programming of cross-release fault count predictions in large and complex software projects. In: Chis, M. (ed.) Evolutionary Computation and Optimization Algorithms in Software Engineering; Application and Techniques. IGI Global, Hershey (2009, pp. 94–126). doi:10.4018/978-1-61520-809-8.ch006

    Google Scholar 

  • Arisholm, E., Briand, L.C., Johannessen, E.B.: A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J. Syst. Softw. 83(1), 2–17 (2010)

    Article  Google Scholar 

  • Briand, L., Melo, W., Wust, J.: Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. Softw. Eng. 28(7), 706–720 (2002)

    Article  Google Scholar 

  • Chiu, T., Fang, D., Chen, J., Wang, Y., Jeris, C.: A robust and scalable clustering algorithm for mixed type attributes in large database environment. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA. ACM, New York (2001)

    Google Scholar 

  • Cudney, E.A., Paryani, K., Ragsdell, K.M.: Applying the Mahalanobis–Taguchi system to vehicle handling. Concurr. Eng. 14, 343–354 (2006)

    Article  Google Scholar 

  • Cudney, E.A., Paryani, K., Ragsdell, K.M.: Identifying useful variables for vehicle braking using the adjoint matrix approach to the Mahalanobis-Taguchi system. J. Ind. Syst. Eng. 1(4), 281–292 (2008)

    Google Scholar 

  • Das, P., Datta, S.: Exploring the effects of chemical composition in hot rolled steel product using Mahalanobis distance scale under Mahalanobis–Taguchi system. Comput. Mater. Sci. 38, 671–677 (2007)

    Article  Google Scholar 

  • Dolado, J.: On the problem of the software cost function. Inf. Softw. Technol. 43(1), 61–72 (2001)

    Article  Google Scholar 

  • Fenton, N., Pfleeger, S.: Software Metrics: A Rigorous and Practical Approach. Springer, Berlin (1998)

    Google Scholar 

  • Fenton, N., Neil, M., Marsh, W., Hearty, P., Radlinski, L., Krause, P.: On the effectiveness of early life cycle defect prediction with Bayesian Nets. Empir. Softw. Eng. 13(5), 499–537 (2008)

    Article  Google Scholar 

  • Goel, B., Singh, Y.: Empirical Investigation of Metrics for Fault Prediction on Object-Oriented Software. Studies in Computational Intelligence, pp. 255–265. Springer, Berlin (2008)

    Google Scholar 

  • Hedayat, A.S., Sloane, N.J.A., Stufken, J.: Orthogonal Arrays: Theory and Applications. Springer, New York (1999)

    MATH  Google Scholar 

  • Holmes, J.: Optimizing the software life cycle. ASQ Soft. Qual. Prof. 5, 14–23 (2003)

    Google Scholar 

  • Huang, M.L., Chen, H.Y.: Development and comparison of automated classifiers for glaucoma diagnosis using stratus optical coherence tomography. Investig. Ophthalmol. Vis. Sci. 46(11), 4121–4129 (2005)

    Article  Google Scholar 

  • Huang, C.L., Hsu, T.S., Liu, C.M.: The Mahalanobis–Taguchi system—neural network algorithm for data-mining in dynamic environments. Expert Syst. Appl. 36, 5475–5480 (2009)

    Article  Google Scholar 

  • Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis. Prentice Hall, New York (1992)

    MATH  Google Scholar 

  • Khoshgoftaar, T., Seliya, N., Sundaresh, N.: An empirical study of predicting software faults with case-based reasoning. Softw. Qual. J. 14(2), 85–111 (2006)

    Article  Google Scholar 

  • Krzanowski, W.J., Hand, D.J.: ROC Curves for Continuous Data. Chapman & Hall/CRC, London (2009)

    Book  MATH  Google Scholar 

  • Kubat, M., Matwin, S.: Addressing the curse of imbalanced training set: one-sided selection. In: Proc. 14th Int’1 Conf. Machine Learning (ICML ’97) (1997)

    Google Scholar 

  • Lessmann, S., Baesens, B., Mues, C., Pietsch, S.: Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans. Softw. Eng. 34(4), 485–496 (2008)

    Article  Google Scholar 

  • Ma, Y., Guo, L., Cukic, B.: Statistical framework for the prediction of fault-proneness. Advances in Machine Learning—Applications in Software Engineering. Idea Group (2007)

  • Mahalanobis, P.C.: On the generalized distance in statistics. Proc. Natl. Inst. Sci. India 2, 49–55 (1936)

    MATH  Google Scholar 

  • Ohlsson, N., Alberg, H.: Predicting fault-prone software modules in telephone switches. IEEE Trans. Softw. Eng. 22(12), 886–894 (1996)

    Article  Google Scholar 

  • Pal, A., Maiti, J.: Development of a hybrid methodology for dimensionality reduction in Mahalanobis–Taguchi system using Mahalanobis distance and binary particle swarm optimization. Expert Syst. Appl. 37, 1286–1293 (2010)

    Article  Google Scholar 

  • R Development Core Team: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2005). ISBN 3-900051-07-0, http://www.R-project.org

    Google Scholar 

  • Su, C.T., Hsiao, Y.H.: An evaluation of the robustness of MTS for imbalanced data. IEEE Trans. Knowl. Data Eng. 19(10), 1321–1332 (2007)

    Article  Google Scholar 

  • Su, C.T., Hsiao, Y.H.: Multiclass MTS for simultaneous feature selection and classification. IEEE Trans. Knowl. Data Eng. 21(2), 192–205 (2009)

    Article  Google Scholar 

  • Taguchi, G., Jugulum, R.: The Mahalanobis-Taguchi Strategy. A Pattern Technology System. Wiley, New York (2002)

    Book  Google Scholar 

  • Taguchi, G., Rajesh, J.: New trends in multivariate diagnosis. Sankhya, Ser. B 62, 233–248 (2000)

    MATH  MathSciNet  Google Scholar 

  • Wohlin, C., Host, M., Runeson, P., Ohlsson, M.C., Regnell, B., Wesslen, A.: Experimentation in Software Engineering: An Introduction. Kluwer Academic, Norwell (2000)

    Book  MATH  Google Scholar 

  • Woodall, W.H., Koudelik, R., Tsui, K.L., Kim, S.B., Stoumbos, Z.G., Carvounis, C.P.: A review and analysis of the Mahalanobis-Taguchi system [with discussion and response]. Technometrics 45(1), 1–30 (2003)

    Article  MathSciNet  Google Scholar 

  • Zhang, T., Ramakrishnon, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Montreal, Canada. ACM, New York (1996)

    Google Scholar 

  • Zhou, Y., Leung, H.: Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Trans. Softw. Eng. 32(10), 771–789 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Feldt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liparas, D., Angelis, L. & Feldt, R. Applying the Mahalanobis-Taguchi strategy for software defect diagnosis. Autom Softw Eng 19, 141–165 (2012). https://doi.org/10.1007/s10515-011-0091-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-011-0091-2

Keywords

Navigation