Abstract
Many robust variants of Principal Component Analysis remove outliers from the data and compute the principal components of the remaining data. The robust centered variant requires knowledge of the center of the non-outliers. Unfortunately, the center of non-outliers is unknown until after the outliers are determined, and using an inaccurate center may lead to the detection of wrong outliers. We demonstrate this problem in several known robust PCA algorithms. We describe a method that implicitly centers the non-outliers, implemented by appending a constant value (bias) to each data point. This bias method can be used with “black box” robust PCA algorithms by augmenting their input with minimal change to the algorithm itself.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
There is a sign ambiguity in calculating eigenvectors, since whenever v is an eigenvector with \(\lambda\) eigenvalue then so is \({-}v\). All eigenvectors in Fig. 2 were normalized so that their coordinate sum is positive.
References
Bouwmans T, Sobral A, Javed S, Jung SK, Zahzah E (2017) Decomposition into low-rank plus additive matrices for background/foreground separation: a review for a comparative evaluation with a large-scale dataset. Comput Sci Rev 23:1–71
Burges C (2010) Dimension reduction: a guided tour. Now Publishers Inc., Hanover
Cadima J, Jolliffe I (2009) On relationships between uncentred and column-centred principal component analysis. Pakistan J Stat 25(4):473–503
Candes EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis?. J ACM 58(3):11–11137
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Gillis N, Vavasis SA (2018) On the complexity of robust PCA and l1-norm low-rank matrix approximation. Math Oper Res 43(4):1072–1084
Golub GH, Van-Loan CF (2013) Matrix computations, 4th edn
Gray V (2017) Principal component analysis: methods. Appl Technol Math Res Dev
Halko N, Martinsson PG, Tropp JA (2011) Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 53(2):217–288
He B, Wan G, Schweitzer H (2020) A bias trick for centered robust principal component analysis (student abstract). In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 13807–13808
Hubert M, Engelen S (2004) Robust PCA and classification in biosciences. Bioinformatics 20(11):1728–1736
Hughes JF, van Dam A, McGuire M, Sklar DF, Foley JD, Feiner SK, Akeley K (2013) Computer graphics: principles and practice, 3rd edn
Jolliffe IT (2002) Principal component analysis, 2nd edn
Kuang-Chih L, Ho J, Kriegman DJ (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698
Lerman G, Maunu T (2018) An overview of robust subspace recovery. Proc IEEE 106(8):1380–1410
Lerman G, Maunu T (2018) Fast, robust and non-convex subspace recovery. Inf Inference J IMA 7(2):277–336
Li H, Linderman GC, Szlam A, Stanton KP, Kluger Y, Tygert M (2017) Algorithm 971: an implementation of a randomized algorithm for principal component analysis. ACM Trans Math Softw 43(3):28–12814
Minsky M, Papert S (1988) Perceptrons, Expanded edn
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes: the art of scientific computing, 3rd edn. Cambridge University Press, Cambridge
Rahmani M, Atia GK (2017) Coherence pursuit: fast, simple, and robust principal component analysis. IEEE Trans Signal Process 65(23):6260–6275. https://doi.org/10.1109/TSP.2017.2749215
Rahmani M, Li P (2019) Outlier detection and robust pca using a convex measure of innovation. In: Advances in neural information processing systems, pp 14223–14233
Shah S, He B, Maung C, Schweitzer H (2018) Computing robust principal components by A* search. Int J Artif Intell Tools 27(7):1860013
Stewart GW, Sun J (1990) Matrix perturbation theory
Vaswani N, Narayanamurthy P (2018) Static and dynamic robust PCA and matrix completion: a review. Proc IEEE 106(8):1359–1379
Vidal R, Ma Y, Sastry SS (2016) Generalized principal component analysis. IEEE Trans Pattern Anal Mach Intell 27(12):1945–1959
Wan G, Schweitzer H (2021) A lookahead algorithm for robust subspace recovery. In: 2021 IEEE international conference on data mining (ICDM), pp 1379–1384. IEEE
Wan G, Schweitzer H (2021) A new robust subspace recovery algorithm (student abstract). In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 15911–15912
Wan G, Schweitzer H (2021) Accelerated combinatorial search for outlier detection with provable bound on sub-optimality. In: Proceedings of the 35th national conference on artificial intelligence (AAAI’21)
Xu H, Caramanis C, Sanghavi S (2010) Robust pca via outlier pursuit. In: Advances in neural information processing systems, pp 2496–2504
You C, Robinson DP, Vidal R (2017) Provable self-representation based outlier detection in a union of subspaces. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3395–3404
Zhang T (2016) Robust subspace recovery by Tyler’s M-estimator. Inf Inference J IMA 5(1):1–21
Zhang H, Lin Z, Zhang C, Chang E.Y (2015) Exact recoverability of robust PCA via outlier pursuit with tight recovery bounds. In: Twenty-ninth AAAI conference on artificial intelligence
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Responsible editor: Srinivasan Parthasarathy
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Guihong Wan and Baokun He have contributed equally to this work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wan, G., He, B. & Schweitzer, H. The art of centering without centering for robust principal component analysis. Data Min Knowl Disc 38, 699–724 (2024). https://doi.org/10.1007/s10618-023-00976-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-023-00976-y