Skip to main content
Log in

A generic all-purpose transformation for multivariate modeling through copulas

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

Copulas have been used in various applications in biomedical sciences and finance. We suggest copulas as the generic all-purpose transformations which can enable one to apply various standard multivariate procedures more efficiently and with better statistical properties and results. More specifically, we consider the problem of transformation of any continuous data to multivariate normality using copulas as a device for defining the transformation. Such a transformation effectively enables us to model a variety of problems involving non-normal data using the classical multivariate statistical techniques. We evaluate and illustrate various applications including those in regression, multicollinearity, principal component analysis, factor analysis, partial least square modeling and structural equation modeling where analyses using the appropriate copula transformations result in substantial improvement in implementation, interpretation, prediction as well as in the corresponding models. A great many datasets available in the literature are analyzed which amply demonstrate the power of such an approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. The variance inflation factor for the ith variable is defined as \((VIF)_i\) = \([\varvec{R}^{-1}_{xx}]_{ii}\) = \(\frac{1}{1-R^2_i},\) where \(\varvec{R}_{xx}\) is sample correlation matrix of the predictor variables and \(R^2_i\) is the \(R^2\) value when \(x_i\) is regressed on \(x_1, x_2, \dots , x_{i-1}, x_{i+1}, \dots ,x_k.\) See [20] for details.

  2. Let \(\lambda _1< \lambda _2< \dots < \lambda _{p}\) be ordered eigenvalues of \(\varvec{X'X},\) where \(\varvec{X}\) is the \(n \times p, p = k+1,\) data matrix on independent variables. Then the ith antieigenvalue of \(\varvec{X'X}\) (Khattree, 2016, [14]) is defined as \(\eta _i^* = 2 \frac{\sqrt{\lambda _i \ \lambda _{p-i+1}}}{\lambda _i + \lambda _{p-i+1}},\)\(i=1,2, \dots , r=[p/2]\) (the integer part of p / 2). Clearly for the fact that geometric mean is smaller than the arithmetic mean, we have \( 0< \eta _i^* < 1.\)

References

  1. Abdi, H.: Partial Least Square Regression (PLS Regression), Encyclopedia for Research Methods for the Social Sciences, pp. 792–795. Sage, Thousand Oaks (2003)

    Google Scholar 

  2. Atkinson, A., Riani, M., Cerioli, A.: Exploring Multivariate Data with The Forward Search. Springer, New York, NY (2004)

    MATH  Google Scholar 

  3. Belsley, D.A.: Conditioning Diagnostics: Collinearity and Weak Data in Regression. Wiley, NewYork, NY (1991)

    MATH  Google Scholar 

  4. Bentler, P. M.: EQS, Structural Equations Program Manual, Program Version 5.0, Encino, CA (1995)

  5. Bentler, P.M., Bonett, D.G.: Significance tests and goodness of fit in the analysis of covariance structures. Psychol. Bull. 88(3), 588 (1980)

    Google Scholar 

  6. Box, G.E.P., Cox, D.R.: An analysis of transformations. J. R. Stat. Soc. Ser. B (Methodol.) 26(2), 211–252 (1964)

    MATH  Google Scholar 

  7. Casella, G., Berger, R.L.: Statistical Inference, 2nd edn. Duxbury, Pacific Grove, CA (2002)

    MATH  Google Scholar 

  8. Chatterjee, S., Hadi, A., Price, B.: The Use of Regression Analysis by Example. Wiley, New York, NY (2006)

    MATH  Google Scholar 

  9. Cherubini, U., Gobbi, F., Mulinacci, S., Romagnoli, S.: Dynamic Copula Methods in Finance. Wiley, Newyork, NY (2012)

    Google Scholar 

  10. Daniel, C., Wood, F.S.: Fitting Equations to Data: Computer Analysis of Multifactor Data. Wiley, New York, NY (1999)

    MATH  Google Scholar 

  11. Graybill, F.A., Iyer, H.K.: Regression Analysis Concepts and Applications. Duxbury, Belmont, CA (1994)

    MATH  Google Scholar 

  12. Joe, H.: Multivariate Models and Multivariate Dependence Concepts. CRC Press, New York, NY (1997)

    MATH  Google Scholar 

  13. Kendall, M.G.: Multivariate Analysis. Macmillan, New York, NY (1980)

    MATH  Google Scholar 

  14. Khattree, R.: Antieigenvalues, multicollinearity and influence: a revisit to regression diagnostics (preprint) (2016)

  15. Khattree, R., Bahuguna, M.: A revisit to estimation of beta risk and an analysis of stock market through copula transformation and winsorization with S&P 500 index as proxy. J. Index Invest. 8(4), 61–83 (2018)

    Google Scholar 

  16. Khattree, R., Bahuguna, M.: An alternative data analytic approach to measure the univariate and multivariate skewness. Int. J. Data Sci. Anal. https://doi.org/10.1007/s41060-018-0106-1 (2018)

    Google Scholar 

  17. Khattree, R., Naik, D.N.: Applied Multivariate Statistics with SAS Software, 2nd edn. SAS Institute Inc., Cary, NC (1999)

    Google Scholar 

  18. Khattree, R., Naik, D.N.: Multivariate Data Reduction and Discrimination with SAS Software. SAS Institute Inc., Cary, NC (2000)

    Google Scholar 

  19. Kiplinger’s Personal Finance,  57(12), 104–123 (2003)

  20. Kutner, M.H., Nachtsheim, C.J., Neter, J., Li, W.: Applied Linear Statistical Models. McGraw-Hill, New York, NY (2005)

    Google Scholar 

  21. Mai, J., Scherer, M.: Financial Engineering with Copulas Explained. Palgrave-Macmillan, London (2014)

    Google Scholar 

  22. McDonald, G.C., Schwing, R.C.: Instabilities of regression estimates relating air pollution to mortality. Technometrics 15(3), 463–481 (1973)

    Google Scholar 

  23. McDonald, G.C., Ayers, J.: Some applications of the Chernoff faces: a technique for graphically representing multivariate data. In: Wang, P. (ed.) Graphical Representation of Multivariate Data, pp. 183–197. Academic Press, New York, NY (1978)

    Google Scholar 

  24. Nelsen, R.B.: An Introduction to Copulas. Springer, New York, NY (2006)

    MATH  Google Scholar 

  25. Rao, C.R.: Test of significance in multivariate analysis. Biometrika 35, 58–79 (1948)

    MathSciNet  MATH  Google Scholar 

  26. O’Rourke, N., Hatcher, L.: A Step-by-Step Approach to Using SAS for Factor Analysis and Structural Equation Modeling, Second edn. SAS Press, Cary, NC (2013)

    Google Scholar 

  27. SAS Institute, SAS/STAT 12.1 User’s Guide: Survey Data Analysis.SAS Institute Inc., Cary, NC (2012)

  28. Sklar, A.: Distribution functions of \(n\) dimensions and margins. Publ. Inst. Stat. Univ. Paris 8, 229–231 (1959)

    Google Scholar 

  29. Sklar, A.: Random variables, distribution functions, and copulas: a personal look backward and forward. IMS Lect. Notes Monogr. Ser. Inst. Math. Stat. 28, 1–14 (1996)

    MathSciNet  Google Scholar 

  30. TC2000.com., TC2000 Software, Version 7 (2010)

  31. Wang, J., Wang, X.: Structural Equation Modeling: Applications Using Mplus. Wiley, New York, NY (2012)

    MATH  Google Scholar 

  32. Wicklin, R.: Generating a random orthogonal matrix, SAS Blogs. http://blogs.sas.com/content/iml/2012/03/28/generating-a-random-orthogonal-matrix.html. SAS Institute Inc. (2012)

  33. Wicklin, R.: Simulating Data with SAS. SAS Institute Inc., Cary, NC (2013)

    Google Scholar 

  34. Wright, S.: On the nature of size factors. Genet. Genet. Soc. Am. Bethesda, MD 3(4), 367–374 (1918)

    Google Scholar 

  35. Wold, S., Sjöström, M., Eriksson, L.: PLS-regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58(2), 109–130 (2001)

    Google Scholar 

Download references

Acknowledgements

The authors wish to thank the associate editor and two referees for their critical comments leading to the improvement in this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ravindra Khattree.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

The data for factor analysis example were simulated using the distributions listed in Table 14.

Table 14 Simulated data set: distribution of various variables

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bahuguna, M., Khattree, R. A generic all-purpose transformation for multivariate modeling through copulas. Int J Data Sci Anal 10, 1–23 (2020). https://doi.org/10.1007/s41060-019-00198-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-019-00198-w

Keywords

Navigation