Kernels, Pre-images and Optimization

Snyder, John C.; Mika, Sebastian; Burke, Kieron; Müller, Klaus-Robert

doi:10.1007/978-3-642-41136-6_21

Kernels, Pre-images and Optimization

John C. Snyder⁴,
Sebastian Mika⁵,
Kieron Burke⁴ &
…
Klaus-Robert Müller^6,7

Chapter
First Online: 01 January 2013

3891 Accesses
8 Citations

Abstract

In the last decade, kernel-based learning has become a state-of-the-art technology in Machine Learning. We briefly review kernel PCAKernel principal component analysis (kPCA) (kPCA) and the pre-image problem that occurs in kPCA. Subsequently, we discuss a novel direction where kernel-based models are used for property optimization. For this purpose, a stable estimation of the model’s gradient is essential and non-trivial to achieve. The appropriate use of pre-image projections is key to successful gradient-based optimization—as will be shown for toy and real-world problems from quantum chemistry and physics.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
In general, \(\mathbf{x}\) is not restricted to being in \({\mathbb{R}}^{m}\) and could be any object.

References

Bartók, A.P., Payne, M.C., Kondor, R., Csányi, G.: Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. Phys. Rev. Lett. 104, 136403 (2010)
Article Google Scholar
Baudat, G., Anouar, F.: Generalized discriminant analysis using a kernel approach. Neural Comput. 12(10), 2385–2404 (2000)
Article Google Scholar
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Article MATH Google Scholar
Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, Pittsburgh, pp. 144–152 (1992)
Google Scholar
Bradley, P., Fayyad, U., Mangasarian, O.: Mathematical programming for data mining: formulations and challenges. J. Comput. 11(3), 217–238 (1999)
MathSciNet MATH Google Scholar
Braun, M., Buhmann, J., Müller, K.R.: On relevant dimensions in kernel feature spaces. J. Mach. Learn. Res. 9, 1875–1908 (2008)
MathSciNet MATH Google Scholar
Burges, C.: A tutorial on support vector machines for pattern recognition. Knowl. Discov. Data Min. 2(2), 121–167 (1998)
Article Google Scholar
Burke, K.: Perspective on density functional theory. J. Chem. Phys. 136(15), 150,901 (2012)
Google Scholar
Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmon. Anal. 21(1), 5–30 (2006)
Article MathSciNet MATH Google Scholar
Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
Diamantaras, K., Kung, S.: Principal Component Neural Networks. Wiley, New York (1996)
MATH Google Scholar
Donoho, D.L., Grimes, C.: Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. 100(10), 5591–5596 (2003)
Article MathSciNet MATH Google Scholar
Dreizler, R.M., Gross, E.K.U.: Density Functional Theory: An Approach to the Quantum Many-Body Problem. Springer, New York (1990)
Book MATH Google Scholar
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. (2013, in press)
Google Scholar
Gestel, T.V., Suykens, J.A.K., Brabanter, J.D., Moor, B.D., Vandewalle, J.: Kernel canonical correlation analysis and least squares support vector machines. In: Proceedings of the International Conference on Artificial Neural Networks (ICANN 2001), Vienna, pp. 381–386 (2001)
Google Scholar
Harmeling, S., Ziehe, A., Kawanabe, M., Müller, K.R.: Kernel-based nonlinear blind source separation. Neural Comput. 15, 1089–1124 (2003)
Article MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009)
Google Scholar
Hohenberg, P., Kohn, W.: Inhomogeneous electron gas. Phys. Rev. B 136(3B), 864–871 (1964)
Article MathSciNet Google Scholar
Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods—Support Vector Learning, pp. 169–184. MIT, Cambridge (1999)
Google Scholar
Kohn, W., Sham, L.J.: Self-consistent equations including exchange and correlation effects. Phys. Rev. A 140(4A), 1133–1138 (1965)
Article MathSciNet Google Scholar
Laskov, P., Gehl, C., Krüger, S., Müller, K.R.: Incremental support vector learning: analysis, implementation and applications. J. Mach. Learn. Res. 7, 1909–1936 (2006)
MathSciNet MATH Google Scholar
Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Müller, K.R.: Fisher discriminant analysis with kernels. In: Hu, Y.H., Larsen, J., Wilson, E., Douglas, S. (eds.) Neural Networks for Signal Processing IX, pp. 41–48. IEEE, New York (1999)
Google Scholar
Mika, S., Schölkopf, B., Smola, A., Müller, K.R., Scholz, M., Rätsch, G.: Kernel PCA and de-noising in feature spaces. In: Kearns, M., Solla, S., Cohn, D. (eds.) Advances in Neural Information Processing Systems, vol. 11, pp. 536–542. MIT, Cambridge (1999)
Google Scholar
Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Smola, A., Müller, K.R.: Constructing descriptive and discriminative nonlinear features: Rayleigh coefficients in kernel feature spaces. IEEE Trans. Patterns Anal. Mach. Intell. 25(5), 623–627 (2003)
Article Google Scholar
Montavon, G., Braun, M., Krüger, T., Müller, K.R.: Analyzing local structure in kernel-based learning: explanation, complexity and reliability assessment. IEEE Signal Process. Mag. 30(4), 62–74 (2013)
Article Google Scholar
Montavon, G., Braun, M., Müller, K.R.: A kernel analysis of deep networks. J. Mach. Learn. Res. 12, 2579–2597 (2011)
Google Scholar
Montavon, G., Müller, K.R.: Big learning and deep neural networks. In: Montavon, G., Orr, G.B., Müller, K.R. (eds.) Neural Networks: Tricks of the Trade, Lecture Notes in Computer Science, vol. 7700, pp. 419–420. Springer, Berlin/Heidelberg (2012)
Chapter Google Scholar
Montavon, G., Orr, G., Müller, K.R. (eds.): Neural Networks: Tricks of the Trade, vol. 7700. In: LNCS. Springer, New York (2012)
Book Google Scholar
Müller, K.R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)
Article Google Scholar
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods — Support Vector Learning, pp. 185–208. MIT, Cambridge (1999)
Google Scholar
Pozun, Z.D., Hansen, K., Sheppard, D., Rupp, M., Müller, K.R., Henkelman, G.: Optimizing transition states via kernel-based machine learning. J. Chem. Phys. 136(17), 174101 (2012)
Article Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
Rupp, M., Tkatchenko, A., Müller, K.R., von Lilienfeld, O.A.: Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108(5), 058301 (2012)
Article Google Scholar
Schölkopf, B., Smola, A., Müller, K.: Nonlinear component analysis as a kernel eigenvalue problem. Neural comput. 10(5), 1299–1319 (1998)
Article Google Scholar
Scholkopf, B., Mika, S., Burges, C., Knirsch, P., Muller, K.R., Ratsch, G., Smola, A.: Input space versus feature space in kernel-based methods. IEEE Trans. Neural Netw. 10, 1000–1017 (1999)
Article Google Scholar
Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A., Williamson, R.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Article MATH Google Scholar
Smola, A., Mika, S., Schölkopf, B., Williamson, R.: Regularized principal manifolds. J. Mach. Learn. Res. 1, 179–209 (2001)
MathSciNet MATH Google Scholar
Snyder, J.C., Rupp, M., Hansen, K., Müller, K.R., Burke, K.: Finding density functionals with machine learning. Phys. Rev. Lett. 108, 253002 (2012)
Article Google Scholar
Snyder, J.C., Rupp, M., Hansen, K., Blooston, L., Müller, K.R., Burke, K.: Orbital-free bond breaking via machine learning. Submitted to J. Chem. Phys. (2013)
Google Scholar
Snyman, J.A.: Practical Mathematical Optimization. Springer, New York (2005)
MATH Google Scholar
Tipping, M.: The relevance vector machine. In: Solla, S., Leen, T., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol. 12, pp. 652–658. MIT, Cambridge (2000)
Google Scholar
Tresp, V.: Scaling kernel-based systems to large data sets. Data Min. Knowl. Discov. 5, 197–211 (2001)
Article MATH Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Book MATH Google Scholar
Wang, J.: Improve local tangent space alignment using various dimensional local coordinates. Neurocomputing 71(16), 3575–3581 (2008)
Article Google Scholar
Zhang, Z.Y., Zha, H.Y.: Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. J. Shanghai University (English Edition) 8(4), 406–424 (2004)
Google Scholar

Download references

Acknowledgements

KRM thanks Vladimir N. Vapnik for continuous mentorship and collaboration since their first discussion in April 1995. This wonderful and serendipitous moment has profoundly changed the scientific agenda of KRM. From then on, KRM’s IDA group—then at GMD FIRST in Berlin—and later the offspring of this group have contributed actively to the exciting research on kernel methods. KRM acknowledges funding by the DFG, the BMBF, the EU and other sources that have helped in this endeavour. This work is supported by the World Class University Program through the National Research Foundation of Korea, funded by the Ministry of Education, Science, and Technology (grant R31-10008). JS and KB thank the NSF (Grant No. CHE-1240252) for funding.

Author information

Authors and Affiliations

Department of Chemistry, Department of Physics, University of California, Irvine, CA, 92697, USA
John C. Snyder & Kieron Burke
idalab GmbH, Adalbertstr. 20, 10997, Berlin, Germany
Sebastian Mika
Machine Learning Group, Technical University of Berlin, 10587, Berlin, Germany
Klaus-Robert Müller
Department of Brain and Cognitive Engineering, Korea University, Anam-dong, Seongbuk-gu, Seoul, 136-713, Korea
Klaus-Robert Müller

Authors

John C. Snyder
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Mika
View author publications
You can also search for this author in PubMed Google Scholar
Kieron Burke
View author publications
You can also search for this author in PubMed Google Scholar
Klaus-Robert Müller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to John C. Snyder or Klaus-Robert Müller .

Editor information

Editors and Affiliations

Max Planck Institute for Intelligent Systems, Tübingen, Germany
Bernhard Schölkopf
Dept. of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
Zhiyuan Luo
Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
Vladimir Vovk

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Snyder, J.C., Mika, S., Burke, K., Müller, KR. (2013). Kernels, Pre-images and Optimization. In: Schölkopf, B., Luo, Z., Vovk, V. (eds) Empirical Inference. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41136-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-41136-6_21
Published: 09 October 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41135-9
Online ISBN: 978-3-642-41136-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics