Stability approach to selecting the number of principal components

Song, Jiyeon; Shin, Seung Jun

doi:10.1007/s00180-018-0826-7

Stability approach to selecting the number of principal components

Original Paper
Published: 16 July 2018

Volume 33, pages 1923–1938, (2018)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Jiyeon Song¹ &
Seung Jun Shin¹

582 Accesses
3 Citations
Explore all metrics

Abstract

Principal component analysis (PCA) is a canonical tool that reduces data dimensionality by finding linear transformations that project the data into a lower dimensional subspace while preserving the variability of the data. Selecting the number of principal components (PC) is essential but challenging for PCA since it represents an unsupervised learning problem without a clear target label at the sample level. In this article, we propose a new method to determine the optimal number of PCs based on the stability of the space spanned by PCs. A series of analyses with both synthetic data and real data demonstrates the superior performance of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Principal Component Analysis by Reverse Iterative Linear Programming

Cauchy robust principal component analysis with applications to high-dimensional data sets

Article Open access 02 November 2023

Robust Principal Component Analysis

References

Bartlett MS (1950) Tests of significance in factor analysis. Br J Stat Psychol 3(2):77–85
Article Google Scholar
Baudry JP, Cardoso M, Celeux G, Amorim MJ, Ferreira AS (2015) Enhancing the selection of a model-based clustering with external categorical variables. Adv Data Anal Class 9(2):177–196
Article MathSciNet Google Scholar
Besse P (1992) PCA stability and choice of dimensionality. Stat Prob Lett 13(5):405–410
Article MathSciNet MATH Google Scholar
Besse P, De Falguerolles A (1993) Application of resampling methods to the choice of dimension in principal component analysis. In: Wolfgang Härdle LS (ed) Computer intensive methods in statistics. Physica-Verlag, Heidelberg, pp 167–176
Chapter Google Scholar
Choi Y, Taylor J, Tibshirani R (2017) Selecting the number of principal components: estimation of the true rank of a noisy matrix. Ann Stat 45(6):2590–2617
Article MathSciNet MATH Google Scholar
Cook RD, Weisberg S (1991) Discussion of “Sliced inverse regression for dimension reduction”. J Am Stat Assoc 86:28–33
MATH Google Scholar
Eastment H, Krzanowski W (1982) Cross-validatory choice of the number of components from a principal component analysis. Technometrics 24(1):73–77
Article MathSciNet Google Scholar
Ferré L (1995) Selection of components in principal component analysis: a comparison of methods. Comput Stat Data Anal 19(6):669–682
Article MathSciNet MATH Google Scholar
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441
Article MATH Google Scholar
Kritchman S, Nadler B (2008) Determining the number of components in a factor model from limited noisy data. Chemometr Intell Lab Syst 94(1):19–32
Article Google Scholar
Li KC (1991) Sliced inverse regression for dimension reduction (with discussion). J Am Stat Assoc 86:316–342
Article MATH Google Scholar
Li L (2007) Sparse sufficient dimension reduction. Biometrika 94(3):603–613
Article MathSciNet MATH Google Scholar
Liu H, Roeder K, Wasserman L (2010) Stability approach to regularization selection (stars) for high dimensional graphical models. In: Advances in neural information processing systems, pp 1432–1440
Meinshausen N, Bühlmann P (2010) Stability selection. J R Stat Soc B 72(4):417–473
Article MathSciNet Google Scholar
Muirhead RJ (2009) Aspects of multivariate statistical theory, vol 197. Wiley, New York
MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Korea University, 45 Anam-ro, Seongbuk-gu, Seoul, 02841, South Korea
Jiyeon Song & Seung Jun Shin

Authors

Jiyeon Song
View author publications
You can also search for this author in PubMed Google Scholar
Seung Jun Shin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seung Jun Shin.

Additional information

This work was supported by National Research Foundation of Korea (NRF) Grant No. 2015R1C1A1A01054913.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 119 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, J., Shin, S.J. Stability approach to selecting the number of principal components. Comput Stat 33, 1923–1938 (2018). https://doi.org/10.1007/s00180-018-0826-7

Download citation

Received: 16 January 2017
Accepted: 12 July 2018
Published: 16 July 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s00180-018-0826-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stability approach to selecting the number of principal components

Abstract

Access this article

Similar content being viewed by others

Robust Principal Component Analysis by Reverse Iterative Linear Programming

Cauchy robust principal component analysis with applications to high-dimensional data sets

Robust Principal Component Analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 119 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Stability approach to selecting the number of principal components

Abstract

Access this article

Similar content being viewed by others

Robust Principal Component Analysis by Reverse Iterative Linear Programming

Cauchy robust principal component analysis with applications to high-dimensional data sets

Robust Principal Component Analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 119 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation