Abstract
Copulas have been used in various applications in biomedical sciences and finance. We suggest copulas as the generic all-purpose transformations which can enable one to apply various standard multivariate procedures more efficiently and with better statistical properties and results. More specifically, we consider the problem of transformation of any continuous data to multivariate normality using copulas as a device for defining the transformation. Such a transformation effectively enables us to model a variety of problems involving non-normal data using the classical multivariate statistical techniques. We evaluate and illustrate various applications including those in regression, multicollinearity, principal component analysis, factor analysis, partial least square modeling and structural equation modeling where analyses using the appropriate copula transformations result in substantial improvement in implementation, interpretation, prediction as well as in the corresponding models. A great many datasets available in the literature are analyzed which amply demonstrate the power of such an approach.
Similar content being viewed by others
Notes
The variance inflation factor for the ith variable is defined as \((VIF)_i\) = \([\varvec{R}^{-1}_{xx}]_{ii}\) = \(\frac{1}{1-R^2_i},\) where \(\varvec{R}_{xx}\) is sample correlation matrix of the predictor variables and \(R^2_i\) is the \(R^2\) value when \(x_i\) is regressed on \(x_1, x_2, \dots , x_{i-1}, x_{i+1}, \dots ,x_k.\) See [20] for details.
Let \(\lambda _1< \lambda _2< \dots < \lambda _{p}\) be ordered eigenvalues of \(\varvec{X'X},\) where \(\varvec{X}\) is the \(n \times p, p = k+1,\) data matrix on independent variables. Then the ith antieigenvalue of \(\varvec{X'X}\) (Khattree, 2016, [14]) is defined as \(\eta _i^* = 2 \frac{\sqrt{\lambda _i \ \lambda _{p-i+1}}}{\lambda _i + \lambda _{p-i+1}},\)\(i=1,2, \dots , r=[p/2]\) (the integer part of p / 2). Clearly for the fact that geometric mean is smaller than the arithmetic mean, we have \( 0< \eta _i^* < 1.\)
References
Abdi, H.: Partial Least Square Regression (PLS Regression), Encyclopedia for Research Methods for the Social Sciences, pp. 792–795. Sage, Thousand Oaks (2003)
Atkinson, A., Riani, M., Cerioli, A.: Exploring Multivariate Data with The Forward Search. Springer, New York, NY (2004)
Belsley, D.A.: Conditioning Diagnostics: Collinearity and Weak Data in Regression. Wiley, NewYork, NY (1991)
Bentler, P. M.: EQS, Structural Equations Program Manual, Program Version 5.0, Encino, CA (1995)
Bentler, P.M., Bonett, D.G.: Significance tests and goodness of fit in the analysis of covariance structures. Psychol. Bull. 88(3), 588 (1980)
Box, G.E.P., Cox, D.R.: An analysis of transformations. J. R. Stat. Soc. Ser. B (Methodol.) 26(2), 211–252 (1964)
Casella, G., Berger, R.L.: Statistical Inference, 2nd edn. Duxbury, Pacific Grove, CA (2002)
Chatterjee, S., Hadi, A., Price, B.: The Use of Regression Analysis by Example. Wiley, New York, NY (2006)
Cherubini, U., Gobbi, F., Mulinacci, S., Romagnoli, S.: Dynamic Copula Methods in Finance. Wiley, Newyork, NY (2012)
Daniel, C., Wood, F.S.: Fitting Equations to Data: Computer Analysis of Multifactor Data. Wiley, New York, NY (1999)
Graybill, F.A., Iyer, H.K.: Regression Analysis Concepts and Applications. Duxbury, Belmont, CA (1994)
Joe, H.: Multivariate Models and Multivariate Dependence Concepts. CRC Press, New York, NY (1997)
Kendall, M.G.: Multivariate Analysis. Macmillan, New York, NY (1980)
Khattree, R.: Antieigenvalues, multicollinearity and influence: a revisit to regression diagnostics (preprint) (2016)
Khattree, R., Bahuguna, M.: A revisit to estimation of beta risk and an analysis of stock market through copula transformation and winsorization with S&P 500 index as proxy. J. Index Invest. 8(4), 61–83 (2018)
Khattree, R., Bahuguna, M.: An alternative data analytic approach to measure the univariate and multivariate skewness. Int. J. Data Sci. Anal. https://doi.org/10.1007/s41060-018-0106-1 (2018)
Khattree, R., Naik, D.N.: Applied Multivariate Statistics with SAS Software, 2nd edn. SAS Institute Inc., Cary, NC (1999)
Khattree, R., Naik, D.N.: Multivariate Data Reduction and Discrimination with SAS Software. SAS Institute Inc., Cary, NC (2000)
Kiplinger’s Personal Finance, 57(12), 104–123 (2003)
Kutner, M.H., Nachtsheim, C.J., Neter, J., Li, W.: Applied Linear Statistical Models. McGraw-Hill, New York, NY (2005)
Mai, J., Scherer, M.: Financial Engineering with Copulas Explained. Palgrave-Macmillan, London (2014)
McDonald, G.C., Schwing, R.C.: Instabilities of regression estimates relating air pollution to mortality. Technometrics 15(3), 463–481 (1973)
McDonald, G.C., Ayers, J.: Some applications of the Chernoff faces: a technique for graphically representing multivariate data. In: Wang, P. (ed.) Graphical Representation of Multivariate Data, pp. 183–197. Academic Press, New York, NY (1978)
Nelsen, R.B.: An Introduction to Copulas. Springer, New York, NY (2006)
Rao, C.R.: Test of significance in multivariate analysis. Biometrika 35, 58–79 (1948)
O’Rourke, N., Hatcher, L.: A Step-by-Step Approach to Using SAS for Factor Analysis and Structural Equation Modeling, Second edn. SAS Press, Cary, NC (2013)
SAS Institute, SAS/STAT 12.1 User’s Guide: Survey Data Analysis.SAS Institute Inc., Cary, NC (2012)
Sklar, A.: Distribution functions of \(n\) dimensions and margins. Publ. Inst. Stat. Univ. Paris 8, 229–231 (1959)
Sklar, A.: Random variables, distribution functions, and copulas: a personal look backward and forward. IMS Lect. Notes Monogr. Ser. Inst. Math. Stat. 28, 1–14 (1996)
TC2000.com., TC2000 Software, Version 7 (2010)
Wang, J., Wang, X.: Structural Equation Modeling: Applications Using Mplus. Wiley, New York, NY (2012)
Wicklin, R.: Generating a random orthogonal matrix, SAS Blogs. http://blogs.sas.com/content/iml/2012/03/28/generating-a-random-orthogonal-matrix.html. SAS Institute Inc. (2012)
Wicklin, R.: Simulating Data with SAS. SAS Institute Inc., Cary, NC (2013)
Wright, S.: On the nature of size factors. Genet. Genet. Soc. Am. Bethesda, MD 3(4), 367–374 (1918)
Wold, S., Sjöström, M., Eriksson, L.: PLS-regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58(2), 109–130 (2001)
Acknowledgements
The authors wish to thank the associate editor and two referees for their critical comments leading to the improvement in this work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
The data for factor analysis example were simulated using the distributions listed in Table 14.
Rights and permissions
About this article
Cite this article
Bahuguna, M., Khattree, R. A generic all-purpose transformation for multivariate modeling through copulas. Int J Data Sci Anal 10, 1–23 (2020). https://doi.org/10.1007/s41060-019-00198-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-019-00198-w