Abstract
Principal component analysis (PCA) is a canonical tool that reduces data dimensionality by finding linear transformations that project the data into a lower dimensional subspace while preserving the variability of the data. Selecting the number of principal components (PC) is essential but challenging for PCA since it represents an unsupervised learning problem without a clear target label at the sample level. In this article, we propose a new method to determine the optimal number of PCs based on the stability of the space spanned by PCs. A series of analyses with both synthetic data and real data demonstrates the superior performance of the proposed method.
Similar content being viewed by others
References
Bartlett MS (1950) Tests of significance in factor analysis. Br J Stat Psychol 3(2):77–85
Baudry JP, Cardoso M, Celeux G, Amorim MJ, Ferreira AS (2015) Enhancing the selection of a model-based clustering with external categorical variables. Adv Data Anal Class 9(2):177–196
Besse P (1992) PCA stability and choice of dimensionality. Stat Prob Lett 13(5):405–410
Besse P, De Falguerolles A (1993) Application of resampling methods to the choice of dimension in principal component analysis. In: Wolfgang Härdle LS (ed) Computer intensive methods in statistics. Physica-Verlag, Heidelberg, pp 167–176
Choi Y, Taylor J, Tibshirani R (2017) Selecting the number of principal components: estimation of the true rank of a noisy matrix. Ann Stat 45(6):2590–2617
Cook RD, Weisberg S (1991) Discussion of “Sliced inverse regression for dimension reduction”. J Am Stat Assoc 86:28–33
Eastment H, Krzanowski W (1982) Cross-validatory choice of the number of components from a principal component analysis. Technometrics 24(1):73–77
Ferré L (1995) Selection of components in principal component analysis: a comparison of methods. Comput Stat Data Anal 19(6):669–682
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441
Kritchman S, Nadler B (2008) Determining the number of components in a factor model from limited noisy data. Chemometr Intell Lab Syst 94(1):19–32
Li KC (1991) Sliced inverse regression for dimension reduction (with discussion). J Am Stat Assoc 86:316–342
Li L (2007) Sparse sufficient dimension reduction. Biometrika 94(3):603–613
Liu H, Roeder K, Wasserman L (2010) Stability approach to regularization selection (stars) for high dimensional graphical models. In: Advances in neural information processing systems, pp 1432–1440
Meinshausen N, Bühlmann P (2010) Stability selection. J R Stat Soc B 72(4):417–473
Muirhead RJ (2009) Aspects of multivariate statistical theory, vol 197. Wiley, New York
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by National Research Foundation of Korea (NRF) Grant No. 2015R1C1A1A01054913.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Song, J., Shin, S.J. Stability approach to selecting the number of principal components. Comput Stat 33, 1923–1938 (2018). https://doi.org/10.1007/s00180-018-0826-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-018-0826-7