Skip to main content
Log in

Robust kernel principal component analysis and classification

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Kernel principal component analysis (KPCA) extends linear PCA from a real vector space to any high dimensional kernel feature space. The sensitivity of linear PCA to outliers is well-known and various robust alternatives have been proposed in the literature. For KPCA such robust versions received considerably less attention. In this article we present kernel versions of three robust PCA algorithms: spherical PCA, projection pursuit and ROBPCA. These robust KPCA algorithms are analyzed in a classification context applying discriminant analysis on the KPCA scores. The performances of the different robust KPCA algorithms are studied in a simulation study comparing misclassification percentages, both on clean and contaminated data. An outlier map is constructed to visualize outliers in such classification problems. A real life example from protein classification illustrates the usefulness of robust KPCA and its corresponding outlier map.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alzate C, Suykens JAK (2008) Kernel component analysis using an epsilon-insensitive robust loss function. IEEE Trans Neural Netw 19: 1583–1598

    Article  Google Scholar 

  • Croux C, Ruiz-Gazen A (1996) A fast algorithm for robust principal components based on projection pursuit. In: COMPSTAT: Proceedings in computational statistics, pp 211–216

  • Croux C, Ruiz-Gazen A (2005) High breakdown estimators for principal components: the projection- pursuit approach revisited. J Multivar Anal 95: 206–226

    Article  MATH  MathSciNet  Google Scholar 

  • Croux C, Filzmoser P, Oliveira MR (2007) Algorithms for projection-pursuit robust principal component analysis. Chemom Intell Lab Syst 87: 218–225

    Article  Google Scholar 

  • Cui H, He X, Ng KW (2003) Asymptotic distributions of principal components based on robust dispersions. Biometrika 90: 953–966

    Article  MathSciNet  Google Scholar 

  • Debruyne M (2009) An outlier map for support vector machine classification. Ann Appl Stat 3(4): 1566–1580

    Article  MATH  Google Scholar 

  • Debruyne M, Hubert M (2009) The influence function of the Stahel-Donoho covariance estimator of smallest outlyingness. Stat Probab Lett 79: 275–282

    Article  MATH  MathSciNet  Google Scholar 

  • Debruyne M, Hubert M, Van Horebeek J (2009a) Detecting influential observations in Kernel PCA. Comput Stat Data Anal (in press). doi:10.1016/j.csda.2009.08.018

  • Debruyne M, Serneels S, Verdonck T (2009b) Robustified least squares support vector classification. J Chemometrics 23(9): 479–486

    Article  Google Scholar 

  • Donoho DL, Gasko M (1992) Breakdown properties of location estimates based on half-space depth and projected outlyingness. Ann Stat 20: 1803–1827

    Article  MATH  MathSciNet  Google Scholar 

  • Friedman JH, Tukey JW (1974) A projection pursuit algorithm for exploratory data analysis. IEEE Trans Comput C-23(9): 881–890

    Article  Google Scholar 

  • Huber PJ (1985) Projection pursuit. Ann Stat 13: 435–475

    Article  MATH  MathSciNet  Google Scholar 

  • Hubert M, Engelen S (2004) Robust PCA and classification in biosciences. Bioinformatics 20: 1728–1736

    Article  Google Scholar 

  • Hubert M, Van Driessen K (2004) Fast and robust discriminant analysis. Comput Stat Data Anal 45: 301–320

    Article  MATH  Google Scholar 

  • Hubert M, Rousseeuw PJ, Verboven S (2002) A fast robust method for principal components with applications to chemometrics. Chemom Intell Lab Syst 60: 101–111

    Article  Google Scholar 

  • Hubert M, Rousseeuw PJ, Vanden Branden K (2005) ROBPCA: a new approach to robust principal components analysis. Technometrics 47: 64–79

    Article  MathSciNet  Google Scholar 

  • Li G, Chen Z (1985) Projection-pursuit approach to robust dispersion matrices and principal components: primary theory and Monte Carlo. J Am Stat Assoc 80: 759–766

    Article  MATH  Google Scholar 

  • Liu Z, Chen D, Bensmail H (2005) Gene expression data classification with kernel principal component analysis. J Biomed Biotechnol 2: 155–169

    Article  Google Scholar 

  • Locantore N, Marron JS, Simpson DG, Tripoli N, Zhang JT, Cohen KL (1999) Robust principal component analysis for functional data. Test 8: 1–73

    Article  MATH  MathSciNet  Google Scholar 

  • Lu C-D, Zhang T-Y, Du X-Z, Li C-P (2004) A robust kernel PCA algorithm. Proc Int Conf Mach Learn Cybernet 5: 3084–3087

    Google Scholar 

  • Marden JI (1999) Some robust estimates of principal components. Stat Probab Lett 43: 349–359

    Article  MATH  MathSciNet  Google Scholar 

  • Maronna RA (2005) Principal components and orthogonal regression based on robust scales. Technometrics 47: 264–273

    Article  MathSciNet  Google Scholar 

  • Maronna RA, Zamar R (2002) Robust estimates of location and dispersion for high-dimensional data sets. Technometrics 44: 307–317

    Article  MathSciNet  Google Scholar 

  • Mika S, Rätsch G, Weston J, Schölkopf B, Müller KR (1999) Fisher discriminant analysis with kernels. In: IEEE international workshop on neural networks for signal processing IX, pp 41–48

  • Nguyen MH, De la Torre F (2009) Robust kernel principal component analysis. Adv Neural Inf Process Syst 21: 1185–1192

    Google Scholar 

  • Ohst C (1988) Beste approximierende Kreise und ihre Eigenschaften (Best approximating spheres and their properties). Diplomarbeit in Mathematik, Institut für Statistik und Wirtschaftsmathematik, RWTH Aachen University

  • Pollack JD, Li Q, Pearl DK (2005) Taxonomic utility of a phylogenetic analysis of phosphoglycerate kinase proteins of Archaea, Bacteria, and Eukaryota: insights by Bayesian analyses. Mol Phylogenet Evol 35: 420–430

    Article  Google Scholar 

  • Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79: 871–880

    Article  MATH  MathSciNet  Google Scholar 

  • Rousseeuw PJ, Croux C (1993) Alternatives to the median absolute deviation. J Am Stat Assoc 88: 1273–1283

    Article  MATH  MathSciNet  Google Scholar 

  • Rousseeuw PJ, Van Driessen K (1999) Fast algorithm for the minimum covariance determinant estimator. Technometrics 41: 212–223

    Article  Google Scholar 

  • Saigo H, Vert J, Ueda N, Akutsul T (2004) Protein homology detection using string alignment kernels. Bioinformatics 20: 1682–1689

    Article  Google Scholar 

  • Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, Cambridge

    Google Scholar 

  • Schölkopf B, Smola A, Müller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10: 1299–1319

    Article  Google Scholar 

  • Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge university press, Cambridge

    Google Scholar 

  • Stahel WA (1981) Robuste Schätzungen: Infinitesimale Optimalität und Schätzungen von Kovarianzmatrizen. PhD thesis, ETH Zürich

  • Suykens JAK, Van Gestel T, De Brabanter J, De Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific, Singapore

    Book  MATH  Google Scholar 

  • Takahashi T, Kurita T (2002) Robust de-noising by kernel PCA. In: Proceedings of the international conference on artificial neural networks. Lecture notes in computer science, vol 2415, pp 739–744

  • Verboven S, Hubert M (2005) LIBRA: a MATLAB library for robust analysis. Chemom Intell Lab Syst 75: 127–136

    Article  Google Scholar 

  • Yang J, Jin Z, Yang JY, Zhang D, Frangi AF (2004) Essence of kernel Fisher discriminant: KPCA plus LDA. Pattern Recognit 37: 2097–2100

    Article  Google Scholar 

  • Yang J, Frangi AF, Yang JY, Zhang D, Jin Z (2005) KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition. IEEE Trans Pattern Anal Mach Intell 27: 230–244

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michiel Debruyne.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Debruyne, M., Verdonck, T. Robust kernel principal component analysis and classification. Adv Data Anal Classif 4, 151–167 (2010). https://doi.org/10.1007/s11634-010-0068-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-010-0068-1

Keywords

Mathematics Subject Classification (2000)

Navigation