Skip to main content
Log in

Stability approach to selecting the number of principal components

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Principal component analysis (PCA) is a canonical tool that reduces data dimensionality by finding linear transformations that project the data into a lower dimensional subspace while preserving the variability of the data. Selecting the number of principal components (PC) is essential but challenging for PCA since it represents an unsupervised learning problem without a clear target label at the sample level. In this article, we propose a new method to determine the optimal number of PCs based on the stability of the space spanned by PCs. A series of analyses with both synthetic data and real data demonstrates the superior performance of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Bartlett MS (1950) Tests of significance in factor analysis. Br J Stat Psychol 3(2):77–85

    Article  Google Scholar 

  • Baudry JP, Cardoso M, Celeux G, Amorim MJ, Ferreira AS (2015) Enhancing the selection of a model-based clustering with external categorical variables. Adv Data Anal Class 9(2):177–196

    Article  MathSciNet  Google Scholar 

  • Besse P (1992) PCA stability and choice of dimensionality. Stat Prob Lett 13(5):405–410

    Article  MathSciNet  MATH  Google Scholar 

  • Besse P, De Falguerolles A (1993) Application of resampling methods to the choice of dimension in principal component analysis. In: Wolfgang Härdle LS (ed) Computer intensive methods in statistics. Physica-Verlag, Heidelberg, pp 167–176

    Chapter  Google Scholar 

  • Choi Y, Taylor J, Tibshirani R (2017) Selecting the number of principal components: estimation of the true rank of a noisy matrix. Ann Stat 45(6):2590–2617

    Article  MathSciNet  MATH  Google Scholar 

  • Cook RD, Weisberg S (1991) Discussion of “Sliced inverse regression for dimension reduction”. J Am Stat Assoc 86:28–33

    MATH  Google Scholar 

  • Eastment H, Krzanowski W (1982) Cross-validatory choice of the number of components from a principal component analysis. Technometrics 24(1):73–77

    Article  MathSciNet  Google Scholar 

  • Ferré L (1995) Selection of components in principal component analysis: a comparison of methods. Comput Stat Data Anal 19(6):669–682

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441

    Article  MATH  Google Scholar 

  • Kritchman S, Nadler B (2008) Determining the number of components in a factor model from limited noisy data. Chemometr Intell Lab Syst 94(1):19–32

    Article  Google Scholar 

  • Li KC (1991) Sliced inverse regression for dimension reduction (with discussion). J Am Stat Assoc 86:316–342

    Article  MATH  Google Scholar 

  • Li L (2007) Sparse sufficient dimension reduction. Biometrika 94(3):603–613

    Article  MathSciNet  MATH  Google Scholar 

  • Liu H, Roeder K, Wasserman L (2010) Stability approach to regularization selection (stars) for high dimensional graphical models. In: Advances in neural information processing systems, pp 1432–1440

  • Meinshausen N, Bühlmann P (2010) Stability selection. J R Stat Soc B 72(4):417–473

    Article  MathSciNet  Google Scholar 

  • Muirhead RJ (2009) Aspects of multivariate statistical theory, vol 197. Wiley, New York

    MATH  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seung Jun Shin.

Additional information

This work was supported by National Research Foundation of Korea (NRF) Grant No. 2015R1C1A1A01054913.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 119 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, J., Shin, S.J. Stability approach to selecting the number of principal components. Comput Stat 33, 1923–1938 (2018). https://doi.org/10.1007/s00180-018-0826-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-018-0826-7

Keywords

Navigation