Skip to main content
Log in

The Analysis of Multivariate Data Using Semi-Definite Programming

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

A model is presented for analyzing general multivariate data. The model puts as its prime objective the dimensionality reduction of the multivariate problem. The only requirement of the model is that the input data to the statistical analysis be a covariance matrix, a correlation matrix, or more generally a positive semi-definite matrix. The model is parameterized by a scale parameter and a shape parameter both of which take on non-negative values smaller than unity. We first prove a wellknown heuristic for minimizing rank and establish the conditions under which rank can be replaced with trace. This result allows us to solve our rank minimization problem as a Semi-Definite Programming (SDP) problem by a number of available solvers. We then apply the model to four case studies dealing with four well-known problems in multivariate analysis. The first problem is to determine the number of underlying factors in factor analysis (FA) or the number of retained components in principal component analysis (PCA). It is shown that our model determines the number of factors or components more efficiently than the commonly used methods. The second example deals with a problem that has received much attention in recent years due to its wide applications, and it concerns sparse principal components and variable selection in PCA. When applied to a data set known in the literature as the pitprop data, we see that our approach yields PCs with larger variances than PCs derived from other approaches. The third problem concerns sensitivity analysis of the multivariate models, a topic not widely researched in the sequel due to its difficulty. Finally, we apply the model to a difficult problem in PCA known as lack of scale invariance in the solutions of PCA. This is the problem that the solutions derived from analyzing the covariance matrix in PCA are generally different (and not linearly related to) the solutions derived from analyzing the correlation matrix. Using our model, we obtain the same solution whether we analyze the correlation matrix or the covariance matrix since the analysis utilizes only the signs of the correlations/covariances but not their values. This is where we introduce a new type of PCA, called Sign PCA, which we speculate on its applications in social sciences and other fields of science.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • AL-IBRAHIM, A.H., and AL-KANDARI, N. (2008), “Stability of the Principal Components”, Computational Statistics, 23, 153–171.

  • ANWAR, M.A., AL-KANDARI, N., and AL-QALLAF, C.L. (2004), “Use of Bostick’s Library Anxiety Scale on Undergraduate Biological Sciences Students of Kuwait University”, Library and Information Science Research, 26, 266–283.

  • BOSTICK, S.L. (1992), “The Development and Validation of the Library Anxiety Scale”, PhD dissertation, Wayne State University, USA.

  • BOYD, S., and VANDENBERGHE, L. (2004), Convex Optimization, Cambridge UK: Cambridge University Press.

  • CADIMA J., and JOLLIFFE, I. (1995), “Loadings and Correlations in the Interpretation of Principal Components”, Applied Statistics, 22, 203–214.

    Article  MathSciNet  Google Scholar 

  • CANDE'S, E.J., and RECHT, B. (2008), “Exact Matrix Completion Via Convex Optimization, Foundations of Computational Mathematics, 9, 717–772.

  • D'ASPREMONT, A., EL GHAOUI, L., JORDAN, M.I., and LANCKRIET, G.R.G. (2004), “A Direct Formulation for Sparse PCA Using Programming”, in Advances in Neural Information Processing Systems (NIPS),Vancouver BC., reprinted in 2007 in SIAM Review, 49 (3), 434–448.

  • ECKART, C., and YOUNG, G. (1936), “The Approximation of One Matrix by Another of Lower Rank”, Psychometrika, 1, 211–218.

    Article  MATH  Google Scholar 

  • GIFI, A. (1990), Nonlinear Multivariate Analysis, Chichester: Wiley.

  • HAN, J., and KAMBER, M. (2006), Data Mining: Concepts and Techniques (2nd ed.), San Francisco CA: Morgan Kaufmann Publishers.

  • JEFFERS, J.N.R. (1967), “Two Case Studies in the Applications Of Principal Component Analysis”, Applied Statistics, 16, 225–236.

    Article  Google Scholar 

  • JOLLIFFE, I.T. (2002), Principal Component Analysis (2nd ed.), New York NY: Springer Verlag.

  • JOLLIFFE I.T., and UDDIN, M. (2003), “A Modified Principal Component Technique Based on the Lasso”, Journal of Computational and Graphical Statistics, 12, 531–547.

    Article  MathSciNet  Google Scholar 

  • LOEHLIN, J.C. (1998), Latent Variable Models: An Introduction to Factor, Path, and Structural Analysis, Mahwah NJ: Lawrence Erlbaum Associates.

  • MESBAHI, M. (1999), “On the Semi-Definite Programming Solution of the Least Order Dynamic Output Feedback Synthesis”, in Proceedings of the American Control Conference, pp. 2355–2359.

  • MOGHADDAM, B., WEISS. Y., and AVIDAN. S., (2006), “Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms”, Advances in Neural Information Processing Systems, 18, 915–922.

  • PATAKI, G. (1998), “On the Rank of Extreme Matrices in Programs and the Multiplicity of Optimal Eigenvalues”, Mathematics of Operations Research, 23(2), 339–358.

  • RENCHER, A.C. (1995), Methods of Multivariate Analysis, New York: Wiley.

    MATH  Google Scholar 

  • RENCHER, A.C. (1998), Multivariate Statistical Inference and Applications, New York: Wiley.

    MATH  Google Scholar 

  • ROHDE, A., and TSYBAKOV, A. (2011), “Estimation Of High-Dimensional Low-Rank Matrices”, Annals of Statistics, 39(2), 887–930.

    Article  MATH  MathSciNet  Google Scholar 

  • SAGNOL, G. (2011), “A Class of Programs with Rank-One Solutions”, Linear Algebra and Its Applications, 435(6), 1446–1463.

  • SHAWE-TAYLOR, J., and CRISTIANINI, N. (2004), Kernel Methods for Pattern Analysis, Cambridge UK: Cambridge University Press.

  • YOUNG, G., and HOUSEHOLDER, A. S. (1938), “Discussion of a Set of Points in Terms of Their Mutual Distances”, Psychometrika, 3, 19–22.

    Article  MATH  Google Scholar 

  • ZOU, H., HASTIE, T., and TIBSHIRANI, R. (2006), “Sparse Principal Component Analysis”, Journal of Computational and Graphical Statistics, 15, 265–286.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A.H. Al-Ibrahim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Ibrahim, A. The Analysis of Multivariate Data Using Semi-Definite Programming. J Classif 32, 382–413 (2015). https://doi.org/10.1007/s00357-015-9184-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-015-9184-0

Keywords

Navigation