Abstract
Principal component analysis is a method of dimensionality reduction based on the eigensystem of the covariance matrix of a set of multivariate observations. Analyzing the effects of some specific observations on this eigensystem is therefore of particular importance in the sensitivity study of the results. In this framework, approximations for the perturbed eigenvalues and eigenvectors when deleting one or several observations are useful from a computational standpoint. Indeed, they allow one to evaluate the effects of these observations without having to recompute the exact perturbed eigenvalues and eigenvectors. However, it turns out that some approximations which have been suggested are based on an incorrect application of matrix perturbation theory. The aim of this short note is to provide the correct formulations which are illustrated with a numerical study.
References
Bénasséni J (1987) Perturbation des poids des unités statistiques et approximation en analyse en composantes principales. R.A.I.R.O Recherche opérationnelle/Oper Res 21:175–198
Bénasséni J (1990) Sensitivity coefficients for the subspaces spanned by principal components. Commun Stat Theory Methods 19:2021–2034
Critchley F (1985) Influence in principal component analysis. Biometrika 72:627–636
Enguix-González A, Muñoz-Pichardo JM, Moreno-Rebollo JL, Pino-Mejías R (2005) Influence analysis in principal component analysis through power-series expansions. Commun Stat Theory Methods 34:2025–2046
Hadi A, Nyquist H (1993) Further theoretical results and a comparison between two methods for approximating eigenvalues of perturbed covariance matrices. Stat Comput 3:113–123
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York
Kendall MG (1975) Multivariate analysis. Griffin, London
Pack P, Jolliffe IT, Morgan BJT (1988) Influential observations in principal component analysis: a case-study. J Appl Stat 15:37–50
Prendergast LA (2008) A note on sensitivity of principal component subspaces and the efficient detection of influential observations in high dimensions. Electron J Stat 2:454–467
Prendergast LA, Suen Li Wai, C (2011) A new and practical influence measure for subsets of covariance matrix sample principal components with applications to high dimensional datasets. Comput Stat Data Anal 55:752–764
Sibson R (1979) Studies in robustness of multidimensional scaling: perturbational analysis of classical scaling. J R Stat Soc B 41:217–229
Tanaka Y (1988) Sensitivity analysis in principal component analysis: influence on the subspace spanned by principal components. Commun Stat Theory Methods 17:3157–3175
Wang S-G, Nyquist H (1991) Effects on the eigenstructure of a data matrix when deleting an observation. Comput Stat Data Anal 11:179–188
Wang S-G, Liski EP (1993) Effects of observations on the eigensystem of a sample covariance matrix. J Stat Plan Inference 36:215–226
Wilkinson JH (1988) The algebraic eigenvalue problem. Clarendon Press, Oxford
Acknowledgements
The author is grateful to the reviewers for their careful reading of the paper and their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bénasséni, J. A correction of approximations used in sensitivity study of principal component analysis. Comput Stat 33, 1939–1955 (2018). https://doi.org/10.1007/s00180-017-0790-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-017-0790-7