Skip to main content

Kernels, Pre-images and Optimization

  • Chapter
  • First Online:

Abstract

In the last decade, kernel-based learning has become a state-of-the-art technology in Machine Learning. We briefly review kernel PCAKernel principal component analysis (kPCA) (kPCA) and the pre-image problem that occurs in kPCA. Subsequently, we discuss a novel direction where kernel-based models are used for property optimization. For this purpose, a stable estimation of the model’s gradient is essential and non-trivial to achieve. The appropriate use of pre-image projections is key to successful gradient-based optimization—as will be shown for toy and real-world problems from quantum chemistry and physics.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In general, \(\mathbf{x}\) is not restricted to being in \({\mathbb{R}}^{m}\) and could be any object.

References

  1. Bartók, A.P., Payne, M.C., Kondor, R., Csányi, G.: Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. Phys. Rev. Lett. 104, 136403 (2010)

    Article  Google Scholar 

  2. Baudat, G., Anouar, F.: Generalized discriminant analysis using a kernel approach. Neural Comput. 12(10), 2385–2404 (2000)

    Article  Google Scholar 

  3. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)

    Article  MATH  Google Scholar 

  4. Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, Pittsburgh, pp. 144–152 (1992)

    Google Scholar 

  5. Bradley, P., Fayyad, U., Mangasarian, O.: Mathematical programming for data mining: formulations and challenges. J. Comput. 11(3), 217–238 (1999)

    MathSciNet  MATH  Google Scholar 

  6. Braun, M., Buhmann, J., Müller, K.R.: On relevant dimensions in kernel feature spaces. J. Mach. Learn. Res. 9, 1875–1908 (2008)

    MathSciNet  MATH  Google Scholar 

  7. Burges, C.: A tutorial on support vector machines for pattern recognition. Knowl. Discov. Data Min. 2(2), 121–167 (1998)

    Article  Google Scholar 

  8. Burke, K.: Perspective on density functional theory. J. Chem. Phys. 136(15), 150,901 (2012)

    Google Scholar 

  9. Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmon. Anal. 21(1), 5–30 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  10. Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20, 273–297 (1995)

    MATH  Google Scholar 

  11. Diamantaras, K., Kung, S.: Principal Component Neural Networks. Wiley, New York (1996)

    MATH  Google Scholar 

  12. Donoho, D.L., Grimes, C.: Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. 100(10), 5591–5596 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  13. Dreizler, R.M., Gross, E.K.U.: Density Functional Theory: An Approach to the Quantum Many-Body Problem. Springer, New York (1990)

    Book  MATH  Google Scholar 

  14. Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. (2013, in press)

    Google Scholar 

  15. Gestel, T.V., Suykens, J.A.K., Brabanter, J.D., Moor, B.D., Vandewalle, J.: Kernel canonical correlation analysis and least squares support vector machines. In: Proceedings of the International Conference on Artificial Neural Networks (ICANN 2001), Vienna, pp. 381–386 (2001)

    Google Scholar 

  16. Harmeling, S., Ziehe, A., Kawanabe, M., Müller, K.R.: Kernel-based nonlinear blind source separation. Neural Comput. 15, 1089–1124 (2003)

    Article  MATH  Google Scholar 

  17. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009)

    Google Scholar 

  18. Hohenberg, P., Kohn, W.: Inhomogeneous electron gas. Phys. Rev. B 136(3B), 864–871 (1964)

    Article  MathSciNet  Google Scholar 

  19. Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods—Support Vector Learning, pp. 169–184. MIT, Cambridge (1999)

    Google Scholar 

  20. Kohn, W., Sham, L.J.: Self-consistent equations including exchange and correlation effects. Phys. Rev. A 140(4A), 1133–1138 (1965)

    Article  MathSciNet  Google Scholar 

  21. Laskov, P., Gehl, C., Krüger, S., Müller, K.R.: Incremental support vector learning: analysis, implementation and applications. J. Mach. Learn. Res. 7, 1909–1936 (2006)

    MathSciNet  MATH  Google Scholar 

  22. Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Müller, K.R.: Fisher discriminant analysis with kernels. In: Hu, Y.H., Larsen, J., Wilson, E., Douglas, S. (eds.) Neural Networks for Signal Processing IX, pp. 41–48. IEEE, New York (1999)

    Google Scholar 

  23. Mika, S., Schölkopf, B., Smola, A., Müller, K.R., Scholz, M., Rätsch, G.: Kernel PCA and de-noising in feature spaces. In: Kearns, M., Solla, S., Cohn, D. (eds.) Advances in Neural Information Processing Systems, vol. 11, pp. 536–542. MIT, Cambridge (1999)

    Google Scholar 

  24. Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Smola, A., Müller, K.R.: Constructing descriptive and discriminative nonlinear features: Rayleigh coefficients in kernel feature spaces. IEEE Trans. Patterns Anal. Mach. Intell. 25(5), 623–627 (2003)

    Article  Google Scholar 

  25. Montavon, G., Braun, M., Krüger, T., Müller, K.R.: Analyzing local structure in kernel-based learning: explanation, complexity and reliability assessment. IEEE Signal Process. Mag. 30(4), 62–74 (2013)

    Article  Google Scholar 

  26. Montavon, G., Braun, M., Müller, K.R.: A kernel analysis of deep networks. J. Mach. Learn. Res. 12, 2579–2597 (2011)

    Google Scholar 

  27. Montavon, G., Müller, K.R.: Big learning and deep neural networks. In: Montavon, G., Orr, G.B., Müller, K.R. (eds.) Neural Networks: Tricks of the Trade, Lecture Notes in Computer Science, vol. 7700, pp. 419–420. Springer, Berlin/Heidelberg (2012)

    Chapter  Google Scholar 

  28. Montavon, G., Orr, G., Müller, K.R. (eds.): Neural Networks: Tricks of the Trade, vol. 7700. In: LNCS. Springer, New York (2012)

    Book  Google Scholar 

  29. Müller, K.R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)

    Article  Google Scholar 

  30. Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods — Support Vector Learning, pp. 185–208. MIT, Cambridge (1999)

    Google Scholar 

  31. Pozun, Z.D., Hansen, K., Sheppard, D., Rupp, M., Müller, K.R., Henkelman, G.: Optimizing transition states via kernel-based machine learning. J. Chem. Phys. 136(17), 174101 (2012)

    Article  Google Scholar 

  32. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)

    Article  Google Scholar 

  33. Rupp, M., Tkatchenko, A., Müller, K.R., von Lilienfeld, O.A.: Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108(5), 058301 (2012)

    Article  Google Scholar 

  34. Schölkopf, B., Smola, A., Müller, K.: Nonlinear component analysis as a kernel eigenvalue problem. Neural comput. 10(5), 1299–1319 (1998)

    Article  Google Scholar 

  35. Scholkopf, B., Mika, S., Burges, C., Knirsch, P., Muller, K.R., Ratsch, G., Smola, A.: Input space versus feature space in kernel-based methods. IEEE Trans. Neural Netw. 10, 1000–1017 (1999)

    Article  Google Scholar 

  36. Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A., Williamson, R.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)

    Article  MATH  Google Scholar 

  37. Smola, A., Mika, S., Schölkopf, B., Williamson, R.: Regularized principal manifolds. J. Mach. Learn. Res. 1, 179–209 (2001)

    MathSciNet  MATH  Google Scholar 

  38. Snyder, J.C., Rupp, M., Hansen, K., Müller, K.R., Burke, K.: Finding density functionals with machine learning. Phys. Rev. Lett. 108, 253002 (2012)

    Article  Google Scholar 

  39. Snyder, J.C., Rupp, M., Hansen, K., Blooston, L., Müller, K.R., Burke, K.: Orbital-free bond breaking via machine learning. Submitted to J. Chem. Phys. (2013)

    Google Scholar 

  40. Snyman, J.A.: Practical Mathematical Optimization. Springer, New York (2005)

    MATH  Google Scholar 

  41. Tipping, M.: The relevance vector machine. In: Solla, S., Leen, T., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol. 12, pp. 652–658. MIT, Cambridge (2000)

    Google Scholar 

  42. Tresp, V.: Scaling kernel-based systems to large data sets. Data Min. Knowl. Discov. 5, 197–211 (2001)

    Article  MATH  Google Scholar 

  43. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    Book  MATH  Google Scholar 

  44. Wang, J.: Improve local tangent space alignment using various dimensional local coordinates. Neurocomputing 71(16), 3575–3581 (2008)

    Article  Google Scholar 

  45. Zhang, Z.Y., Zha, H.Y.: Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. J. Shanghai University (English Edition) 8(4), 406–424 (2004)

    Google Scholar 

Download references

Acknowledgements

KRM thanks Vladimir N. Vapnik for continuous mentorship and collaboration since their first discussion in April 1995. This wonderful and serendipitous moment has profoundly changed the scientific agenda of KRM. From then on, KRM’s IDA group—then at GMD FIRST in Berlin—and later the offspring of this group have contributed actively to the exciting research on kernel methods. KRM acknowledges funding by the DFG, the BMBF, the EU and other sources that have helped in this endeavour. This work is supported by the World Class University Program through the National Research Foundation of Korea, funded by the Ministry of Education, Science, and Technology (grant R31-10008). JS and KB thank the NSF (Grant No. CHE-1240252) for funding.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to John C. Snyder or Klaus-Robert Müller .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Snyder, J.C., Mika, S., Burke, K., Müller, KR. (2013). Kernels, Pre-images and Optimization. In: Schölkopf, B., Luo, Z., Vovk, V. (eds) Empirical Inference. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41136-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41136-6_21

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41135-9

  • Online ISBN: 978-3-642-41136-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics