The Analysis of Multivariate Data Using Semi-Definite Programming

Al-Ibrahim, A.H.

doi:10.1007/s00357-015-9184-0

The Analysis of Multivariate Data Using Semi-Definite Programming

Published: 07 October 2015

Volume 32, pages 382–413, (2015)
Cite this article

Journal of Classification Aims and scope Submit manuscript

A.H. Al-Ibrahim¹

161 Accesses
1 Citation
Explore all metrics

Abstract

A model is presented for analyzing general multivariate data. The model puts as its prime objective the dimensionality reduction of the multivariate problem. The only requirement of the model is that the input data to the statistical analysis be a covariance matrix, a correlation matrix, or more generally a positive semi-definite matrix. The model is parameterized by a scale parameter and a shape parameter both of which take on non-negative values smaller than unity. We first prove a wellknown heuristic for minimizing rank and establish the conditions under which rank can be replaced with trace. This result allows us to solve our rank minimization problem as a Semi-Definite Programming (SDP) problem by a number of available solvers. We then apply the model to four case studies dealing with four well-known problems in multivariate analysis. The first problem is to determine the number of underlying factors in factor analysis (FA) or the number of retained components in principal component analysis (PCA). It is shown that our model determines the number of factors or components more efficiently than the commonly used methods. The second example deals with a problem that has received much attention in recent years due to its wide applications, and it concerns sparse principal components and variable selection in PCA. When applied to a data set known in the literature as the pitprop data, we see that our approach yields PCs with larger variances than PCs derived from other approaches. The third problem concerns sensitivity analysis of the multivariate models, a topic not widely researched in the sequel due to its difficulty. Finally, we apply the model to a difficult problem in PCA known as lack of scale invariance in the solutions of PCA. This is the problem that the solutions derived from analyzing the covariance matrix in PCA are generally different (and not linearly related to) the solutions derived from analyzing the correlation matrix. Using our model, we obtain the same solution whether we analyze the correlation matrix or the covariance matrix since the analysis utilizes only the signs of the correlations/covariances but not their values. This is where we introduce a new type of PCA, called Sign PCA, which we speculate on its applications in social sciences and other fields of science.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

Article 17 October 2016

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

Article 07 February 2017

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

References

AL-IBRAHIM, A.H., and AL-KANDARI, N. (2008), “Stability of the Principal Components”, Computational Statistics, 23, 153–171.
ANWAR, M.A., AL-KANDARI, N., and AL-QALLAF, C.L. (2004), “Use of Bostick’s Library Anxiety Scale on Undergraduate Biological Sciences Students of Kuwait University”, Library and Information Science Research, 26, 266–283.
BOSTICK, S.L. (1992), “The Development and Validation of the Library Anxiety Scale”, PhD dissertation, Wayne State University, USA.
BOYD, S., and VANDENBERGHE, L. (2004), Convex Optimization, Cambridge UK: Cambridge University Press.
CADIMA J., and JOLLIFFE, I. (1995), “Loadings and Correlations in the Interpretation of Principal Components”, Applied Statistics, 22, 203–214.
Article MathSciNet Google Scholar
CANDE'S, E.J., and RECHT, B. (2008), “Exact Matrix Completion Via Convex Optimization, Foundations of Computational Mathematics, 9, 717–772.
D'ASPREMONT, A., EL GHAOUI, L., JORDAN, M.I., and LANCKRIET, G.R.G. (2004), “A Direct Formulation for Sparse PCA Using Programming”, in Advances in Neural Information Processing Systems (NIPS),Vancouver BC., reprinted in 2007 in SIAM Review, 49 (3), 434–448.
ECKART, C., and YOUNG, G. (1936), “The Approximation of One Matrix by Another of Lower Rank”, Psychometrika, 1, 211–218.
Article MATH Google Scholar
GIFI, A. (1990), Nonlinear Multivariate Analysis, Chichester: Wiley.
HAN, J., and KAMBER, M. (2006), Data Mining: Concepts and Techniques (2nd ed.), San Francisco CA: Morgan Kaufmann Publishers.
JEFFERS, J.N.R. (1967), “Two Case Studies in the Applications Of Principal Component Analysis”, Applied Statistics, 16, 225–236.
Article Google Scholar
JOLLIFFE, I.T. (2002), Principal Component Analysis (2nd ed.), New York NY: Springer Verlag.
JOLLIFFE I.T., and UDDIN, M. (2003), “A Modified Principal Component Technique Based on the Lasso”, Journal of Computational and Graphical Statistics, 12, 531–547.
Article MathSciNet Google Scholar
LOEHLIN, J.C. (1998), Latent Variable Models: An Introduction to Factor, Path, and Structural Analysis, Mahwah NJ: Lawrence Erlbaum Associates.
MESBAHI, M. (1999), “On the Semi-Definite Programming Solution of the Least Order Dynamic Output Feedback Synthesis”, in Proceedings of the American Control Conference, pp. 2355–2359.
MOGHADDAM, B., WEISS. Y., and AVIDAN. S., (2006), “Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms”, Advances in Neural Information Processing Systems, 18, 915–922.
PATAKI, G. (1998), “On the Rank of Extreme Matrices in Programs and the Multiplicity of Optimal Eigenvalues”, Mathematics of Operations Research, 23(2), 339–358.
RENCHER, A.C. (1995), Methods of Multivariate Analysis, New York: Wiley.
MATH Google Scholar
RENCHER, A.C. (1998), Multivariate Statistical Inference and Applications, New York: Wiley.
MATH Google Scholar
ROHDE, A., and TSYBAKOV, A. (2011), “Estimation Of High-Dimensional Low-Rank Matrices”, Annals of Statistics, 39(2), 887–930.
Article MATH MathSciNet Google Scholar
SAGNOL, G. (2011), “A Class of Programs with Rank-One Solutions”, Linear Algebra and Its Applications, 435(6), 1446–1463.
SHAWE-TAYLOR, J., and CRISTIANINI, N. (2004), Kernel Methods for Pattern Analysis, Cambridge UK: Cambridge University Press.
YOUNG, G., and HOUSEHOLDER, A. S. (1938), “Discussion of a Set of Points in Terms of Their Mutual Distances”, Psychometrika, 3, 19–22.
Article MATH Google Scholar
ZOU, H., HASTIE, T., and TIBSHIRANI, R. (2006), “Sparse Principal Component Analysis”, Journal of Computational and Graphical Statistics, 15, 265–286.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

P. O Box 430, Surrah, 45705, Kuwait
A.H. Al-Ibrahim

Authors

A.H. Al-Ibrahim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A.H. Al-Ibrahim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Ibrahim, A. The Analysis of Multivariate Data Using Semi-Definite Programming. J Classif 32, 382–413 (2015). https://doi.org/10.1007/s00357-015-9184-0

Download citation

Published: 07 October 2015
Issue Date: October 2015
DOI: https://doi.org/10.1007/s00357-015-9184-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Analysis of Multivariate Data Using Semi-Definite Programming

Abstract

Access this article

Similar content being viewed by others

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

The Frank-Wolfe Algorithm: A Short Introduction

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Analysis of Multivariate Data Using Semi-Definite Programming

Abstract

Access this article

Similar content being viewed by others

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

The Frank-Wolfe Algorithm: A Short Introduction

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation