Certifiably optimal sparse principal component analysis

Berk, Lauren; Bertsimas, Dimitris

doi:10.1007/s12532-018-0153-6

Certifiably optimal sparse principal component analysis

Full Length Paper
Published: 01 January 2019

Volume 11, pages 381–420, (2019)
Cite this article

Mathematical Programming Computation Aims and scope Submit manuscript

1077 Accesses
13 Citations
3 Altmetric
Explore all metrics

Abstract

This paper addresses the sparse principal component analysis (SPCA) problem for covariance matrices in dimension n aiming to find solutions with sparsity k using mixed integer optimization. We propose a tailored branch-and-bound algorithm, Optimal-SPCA, that enables us to solve SPCA to certifiable optimality in seconds for \(n = 100\) s, \(k=10\) s. This same algorithm can be applied to problems with \(n=10{,}000\,\mathrm{s}\) or higher to find high-quality feasible solutions in seconds while taking several hours to prove optimality. We apply our methods to a number of real data sets to demonstrate that our approach scales to the same problem sizes attempted by other methods, while providing superior solutions compared to those methods, explaining a higher portion of variance and permitting complete control over the desired sparsity. The software that was reviewed as part of this submission has been given the DOI (digital object identifier) https://doi.org/10.5281/zenodo.2027898.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

Sebastian Pokutta

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Article 13 April 2024

Jianchao Bai, Linyuan Jia & Zheng Peng

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition

References

Amini, A.A., Wainwright, M.J.: High-dimensional analysis of semidefinite relaxations for sparse principal components. In: IEEE International Symposium on Information Theory, pp. 2454–2458. IEEE (2008)
Asteris, M., Papailiopoulos, D., Kyrillidis, A., Dimakis, A.G.: Sparse PCA via bipartite matchings. In: Advances in Neural Information Processing Systems, pp. 766–774 (2015)
Bair, E., Hastie, T., Paul, D., Tibshirani, R.: Prediction by supervised principal components. J. Am. Stat. Assoc. 101(473), 119–137 (2006)
Article MathSciNet MATH Google Scholar
Beck, A., Vaisbourd, Y.: The sparse principal component analysis problem: optimality conditions and algorithms. J0 Optim. Theory Appl. 170(1), 119–143 (2016)
Article MathSciNet MATH Google Scholar
Bennett, K.P., Parrado-Hernández, E.: The interplay of optimization and machine learning research. J. Mach. Learn. Res. 7, 1265–1281 (2006)
MathSciNet MATH Google Scholar
Bertsimas, D., Copenhaver, M.S.: Characterization of the equivalence of robustification and regularization in linear and matrix regression. Eur. J. Oper. Res. 270, 931–942 (2017)
Article MathSciNet MATH Google Scholar
Bertsimas, D., Copenhaver, M.S., Mazumder, R.: Certifiably optimal low rank factor analysis. J. Mach. Learn. Res. 18(29), 1–53 (2017)
MathSciNet MATH Google Scholar
Bertsimas, D., Dunn, J.: Optimal classification trees. Mach. Learn. 64(1), 1–44 (2017)
MathSciNet MATH Google Scholar
Bertsimas, D., King, A.: An algorithmic approach to linear regression. Oper. Res. 64(1), 2–16 (2016)
Article MathSciNet MATH Google Scholar
Bertsimas, D., King, A., Mazumder, R., et al.: Best subset selection via a modern optimization lens. Ann. Stat. 44(2), 813–852 (2016)
Article MathSciNet MATH Google Scholar
Bertsimas, D., Shioda, R.: Classification and regression via integer optimization. Oper. Res. 55(2), 252–271 (2007)
Article MathSciNet MATH Google Scholar
Bixby, R.E.: A brief history of linear and mixed-integer programming computation. Doc. Math. Extra Volume: Optimization Stories, 107–121 (2012)
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11 (2011)
Article MathSciNet MATH Google Scholar
Carrizosa, E., Guerrero, V.: rs-Sparse principal component analysis: a mixed integer nonlinear programming approach with VNS. Comput. Oper. Res. 52, 349–354 (2014)
Article MathSciNet MATH Google Scholar
Chamberlain, G., Rothschild, M.J.: Arbitrage, factor structure, and mean-variance analysis on large asset markets. Econometrica 51, 1281–1304 (1983)
Article MathSciNet MATH Google Scholar
Chan, S.O., Papailiopoulos, D., Rubinstein, A.: On the worst-case approximability of sparse PCA. arXiv preprint arXiv:1507.05950 (2015)
Chen, Y., Jalali, A., Sanghavi, S., Xu, H.: Clustering partially observed graphs via convex optimization. J. Mach. Learn. Res. 15(1), 2213–2238 (2014)
MathSciNet MATH Google Scholar
Computing, J.: Julia micro-benchmarks (2018). https://julialang.org/benchmarks/
d’Aspremont, A., Bach, F., Ghaoui, L.E.: Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 9, 1269–1294 (2008)
MathSciNet MATH Google Scholar
d’Aspremont, A., El Ghaoui, L., Jordan, M.I., Lanckriet, G.R.: A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49(3), 434–448 (2007)
Article MathSciNet MATH Google Scholar
Deluzio, K., Astephen, J.: Biomechanical features of gait waveform data associated with knee osteoarthritis: an application of principal component analysis. Gait Posture 25(1), 86–93 (2007)
Article Google Scholar
Ding, C., He, X.: K-means clustering via principal component analysis. In: Proceedings of the twenty-first international conference on Machine learning, Banff, Alberta, Canada, 04–08 July 2004, p. 29. ACM, New York (2004). https://doi.org/10.1145/1015330.1015408
Du, Q., Fowler, J.E.: Hyperspectral image compression using jpeg2000 and principal component analysis. IEEE Geosci. Remote Sens. Lett. 4(2), 201–205 (2007)
Article Google Scholar
Dunning, I., Huchette, J., Lubin, M.: JuMP: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017). https://doi.org/10.1137/15M1020575
Article MathSciNet MATH Google Scholar
Gurobi Optimization Inc.: Gurobi 7.0 performance benchmarks. http://www.gurobi.com/pdfs/benchmarks.pdf (2015). Accessed 17 Dec 2016
Gurobi Optimization Inc.: Gurobi optimizer reference manual (2017). http://www.gurobi.com
Hand, D.J., Daly, F., McConway, K., Lunn, D., Ostrowski, E.: A Handbook of Small Data Sets, vol. 1. CRC Press, Boca Raton (1993)
MATH Google Scholar
Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press, Boca Raton (2015)
Book MATH Google Scholar
Hein, M., Bühler, T.: An inverse power method for nonlinear eigenproblems with applications in 1-spectral clustering and sparse PCA. In: Advances in Neural Information Processing Systems, pp. 847–855 (2010)
Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)
Article MATH Google Scholar
Hsu, Y.L., Huang, P.Y., Chen, D.T.: Sparse principal component analysis in cancer research. Transl. Cancer Res. 3(3), 182 (2014)
Google Scholar
IBM: IBM ILOG CPLEX User’s manual (2017). https://www-01.ibm.com/software/commerce/optimization/cplex-optimizer/
Iezzoni, A.F., Pritts, M.P.: Applications of principal component analysis to horticultural research. HortScience 26(4), 334–338 (1991)
Article Google Scholar
Iguchi, T., Mixon, D.G., Peterson, J., Villar, S.: Probably certifiably correct k-means clustering. Math. Program. 165(2), 605–642 (2017)
Article MathSciNet MATH Google Scholar
Jeffers, J.N.: Two case studies in the application of principal component analysis. Appl. Stat. 16(3), 225–236 (1967)
Article Google Scholar
Jolliffe, I.T.: Rotation of principal components: choice of normalization constraints. J. Appl. Stat. 22(1), 29–35 (1995)
Article MathSciNet Google Scholar
Jolliffe, I.T.: Principal Component Analysis. Wiley, London (2002)
MATH Google Scholar
Jolliffe, I.T., Trendafilov, N.T., Uddin, M.: A modified principal component technique based on the LASSO. J. Comput. Graph. Stat. 12(3), 531–547 (2003)
Article MathSciNet Google Scholar
Journée, M., Nesterov, Y., Richtárik, P., Sepulchre, R.: Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11, 517–553 (2010)
MathSciNet MATH Google Scholar
Kaiser, H.F.: The varimax criterion for analytic rotation in factor analysis. Psychometrika 23(3), 187–200 (1958)
Article MATH Google Scholar
Kumar, V., Kanal, L.N.: Parallel branch-and-bound formulations for and/or tree search. IEEE Trans. Pattern Anal. Mach. Intell. 42(6), 768–778 (1984)
Article Google Scholar
Labib, K., Vemuri, V.R.: An application of principal component analysis to the detection and visualization of computer network attacks. Annales des Telecommunications/Ann. Telecommun. 61(1–2), 218–234 (2006)
Article Google Scholar
Land, A.H., Doig, A.G.: An automatic method of solving discrete programming problems. Econometrica 28, 497–520 (1960)
Article MathSciNet MATH Google Scholar
Lee, S., Epstein, M.P., Duncan, R., Lin, X.: Sparse principal component analysis for identifying ancestry-informative markers in genome-wide association studies. Genet. Epidemiol. 36(4), 293–302 (2012)
Article Google Scholar
Lee, Y.K., Lee, E.R., Park, B.U.: Principal component analysis in very high-dimensional spaces. Stat. Sin. 22(1), 933–956 (2012)
MathSciNet MATH Google Scholar
Leng, C., Wang, H.: On general adaptive sparse principal component analysis. J. Comput. Graph. Stat. 18(1), 201–215 (2009)
Article MathSciNet Google Scholar
Li, G.J., Wah, B.W.: Coping with anomalies in parallel branch-and-bound algorithms. IEEE Trans. Comput. 100(6), 568–573 (1986)
Article MathSciNet Google Scholar
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Lougee-Heimer, R.: The common optimization interface for operations research. IBM J. Res. Dev. 47(1), 57–66 (2003)
Article Google Scholar
Luss, R., Teboulle, M.: Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint. SIAM Rev. 55(1), 65–98 (2013)
Article MathSciNet MATH Google Scholar
Ma, Z., et al.: Sparse principal component analysis and iterative thresholding. Ann. Stat. 41(2), 772–801 (2013)
Article MathSciNet MATH Google Scholar
Mangasarian, O.L.: Exact 1-norm support vector machines via unconstrained convex differentiable minimization. J. Mach. Learn. Res. 7, 1517–1530 (2006)
MathSciNet MATH Google Scholar
Mazumder, R., Radchenko, P., Dedieu, A.: Subset selection with shrinkage: sparse linear modeling when the snr is low. arXiv preprint arXiv:1708.03288 (2017)
Moghaddam, B., Weiss, Y., Avidan, S.: Spectral bounds for sparse PCA: Exact and greedy algorithms. In: Advances in Neural Information Processing Systems, pp. 915–922 (2005)
Nemhauser, G.L.: Integer Programming: the Global Impact. Presented at EURO, INFORMS, Rome, Italy, 2013. http://euro-informs2013.org/data/http_/euro2013.org/wp-content/uploads/nemhauser.pdf (2013). Accessed 9 Sept 2015
Papailiopoulos, D.S., Dimakis, A.G., Korokythakis, S.: Sparse PCA through low-rank approximations. ICML 3, 747–755 (2013)
Google Scholar
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)
Price, A.L., Patterson, N.J., Plenge, R.M., Weinblatt, M.E., Shadick, N.A., Reich, D.: Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38(8), 904–909 (2006)
Article Google Scholar
Richman, M.B.: Rotation of principal components. J. Climatol. 6(3), 293–335 (1986)
Article MathSciNet Google Scholar
Richtárik, P., Takáč, M., Ahipaşaoğlu, S.D.: Alternating maximization: unifying framework for 8 sparse PCA formulations and efficient parallel codes. arXiv preprint arXiv:1212.4137 (2012)
Scott, D.S.: On the accuracy of the Gerschgorin circle theorem for bounding the spread of a real symmetric matrix. Linear Algebra Appl. 65, 147–155 (1985)
Article MathSciNet MATH Google Scholar
Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 25, 2960–2968 (2012)
Google Scholar
Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2012)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Top500 Supercomputer Sites: performance development. http://www.top500.org/statistics/perfdevel/ (2016). Accessed 17 Dec 2016
Wilkinson, J.H.: The Algebraic Eigenvalue Problem, vol. 87. Clarendon Press, Oxford (1965)
MATH Google Scholar
Witten, D., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3), 515–534 (2009)
Article Google Scholar
Witten, D.M., Tibshirani, R.J.: Extensions of sparse canonical correlation analysis with applications to genomic data. Stat. Appl. Genet. Mol. Biol. 8(1), 1–27 (2009)
Article MathSciNet MATH Google Scholar
Yanover, C., Meltzer, T., Weiss, Y.: Linear programming relaxations and belief propagation—an empirical study. J. Mach. Learn. Res. 7, 1887–1907 (2006)
MathSciNet MATH Google Scholar
Yuan, X.T., Zhang, T.: Truncated power method for sparse eigenvalue problems. J. Mach. Learn. Res. 14, 899–925 (2013)
MathSciNet MATH Google Scholar
Zeng, Z.Q., Yu, H.B., Xu, H.R., Xie, Y.Q., Gao, J.: Fast training support vector machines using parallel sequential minimal optimization. In: 3rd International Conference on Intelligent System and Knowledge Engineering, 2008, vol. 1, pp. 997–1001. ISKE 2008. IEEE (2008)
Zhang, Y., Ghaoui, L.E.: Large-scale sparse principal component analysis with application to text data. In: Advances in Neural Information Processing Systems, vol. 24, pp. 532–539 (2011)
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Lauren Berk & Dimitris Bertsimas

Authors

Lauren Berk
View author publications
You can also search for this author in PubMed Google Scholar
Dimitris Bertsimas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimitris Bertsimas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Overview of the Optimal-SPCA implementation in Julia

The linked repository contains an implementation of Optimal-SPCA written in Julia 0.6.0. The latest version of this software is available on GitHub at https://github.com/lauren897/Optimal-SPCA. The Algorithm directory contains the Julia files that comprise the algorithm, and the Data directory contains an example dataset.

In order to run this software, you must install a recent version of Julia from http://julialang.org/downloads/. The most recent version of Julia at the time this code was last tested before publication was Julia 0.6.0.

Two packages must be installed in Julia before the code can be run. These packages are DataFrames, and StatsBase. They can be added by running Pkg.add(“DataFrames”) and Pkg.add(“StatsBase”) respectively.

At this point, the file test.jl should run successfully. To run the script, navigate to the Algorithm directory, and run include(“test.jl”). The script will run Optimal-SPCA on the Pitprops dataset, and then generate an additional random problem and run the algorithm on that problem. It will then identify the first few sparse principal components using Optimal-SPCA sequentially and reporting the cumulative variance explained.

The key method used in the algorithm is is branchAndBound. It takes two required arguments: prob, and k. The variable prob uses a custom type that holds the original data as well as the covariance matrix associated with the problem. (If data is not available, the Cholesky factorization of the covariance matrix will suffice.) The data is presented in an \(m \times n\) array, with \(m\) data points in \(n\) dimensions. The corresponding covariance matrix is \(n \times n\). The parameter k is a positive integer less than \(n\) and represents the desired sparsity.

By default, branchAndBound solves the problem and returns the objective function value, solution vector, and a few performance metrics, including time elapsed and the number of nodes explored. There are many optional parameters, some of which are discussed in detail in our paper. Other parameters have to do with technical aspects of the algorithm, like convergence criteria and resizing arrays. These are commented on in detail in the branchAndBound.jl file where the function is defined.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Berk, L., Bertsimas, D. Certifiably optimal sparse principal component analysis. Math. Prog. Comp. 11, 381–420 (2019). https://doi.org/10.1007/s12532-018-0153-6

Download citation

Received: 18 March 2017
Accepted: 23 November 2018
Published: 01 January 2019
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s12532-018-0153-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Certifiably optimal sparse principal component analysis

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A Overview of the Optimal-SPCA implementation in Julia

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Certifiably optimal sparse principal component analysis

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A Overview of the Optimal-SPCA implementation in Julia

A Overview of the Optimal-SPCA implementation in Julia

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation