Skip to main content

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 49))

  • 4073 Accesses

Abstract

Kernel methods are a class of non-parametric learning techniques relying on kernels. A kernel generalizes dot products to arbitrary domains and can thus be seen as a similarity measure between data points with complex structures. The use of kernels allows to decouple the representation of the data from the specific learning algorithm, provided it can be defined in terms of distance or similarity between instances. Under this unifying formalism a wide range of methods have been developed, dealing with binary and multiclass classification, regression, ranking, clustering and novelty detection to name a few. Recent developments include statistical tests of dependency and alignments between related domains, such as documents written in different languages. Key to the success of any kernel method is the definition of an appropriate kernel for the data at hand. A well-designed kernel should capture the aspects characterizing similar instances while being computationally efficient. Building on the seminal work by D. Haussler on convolution kernels, a vast literature on kernels for structured data has arisen. Kernels have been designed for sequences, trees and graphs, as well as arbitrary relational data represented in first or higher order logic. From the representational viewpoint, this allowed to address one of the main limitations of statistical learning approaches, namely the difficulty to deal with complex domain knowledge. Interesting connections between the complementary fields of statistical and symbolic learning have arisen as one of the consequences. Another interesting connection made possible by kernels is between generative and discriminative learning. Here data are represented with generative models and appropriate kernels are built on top of them to be used in a discriminative setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aiolli, F., Da San Martino, G., Sperduti, A.: Route kernels for trees. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, pp. 17–24. ACM, New York (2009)

    Google Scholar 

  2. Amari, S.I.: Mathematical foundations of neurocomputing. Proceedings of the IEEE 78(9), 1443–1463 (1990)

    Article  Google Scholar 

  3. Amari, S.I.: Natural gradient works efficiently in learning. Neural Computation 10, 251–276 (1998)

    Article  Google Scholar 

  4. Aronszajn, N.: Theory of reproducing kernels. Trans. Amer. Math. Soc. 686, 337–404 (1950)

    Article  MathSciNet  Google Scholar 

  5. Bakir, G.H., Hofmann, T., Schölkopf, B., Smola, A.J., Taskar, B., Vishwanathan, S.V.N.: Predicting Structured Data (Neural Information Processing). The MIT Press (2007)

    Google Scholar 

  6. Berg, C., Christensen, J.P.R., Ressel, P.: Harmonic Analysis on Semigroups. Springer, New York (1984)

    Book  MATH  Google Scholar 

  7. Borgwardt, K.M.: Graph Kernels. PhD thesis, Ludwig-Maximilians-University Munich (2007)

    Google Scholar 

  8. Borgwardt, K.M., Kriegel, H.-P.: Shortest-path kernels on graphs. In: Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM 2005, pp. 74–81. IEEE Computer Society, Washington, DC (2005)

    Google Scholar 

  9. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifier. In: Proc. 5th ACM Workshop on Computational Learning Theory, Pittsburgh, PA, pp. 144–152 (July 1992)

    Google Scholar 

  10. Collins, M., Duffy, N.: Convolution kernels for natural language. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14. MIT Press (2002)

    Google Scholar 

  11. Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, PA, USA, pp. 263–270 (2002)

    Google Scholar 

  12. Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2002)

    MATH  Google Scholar 

  13. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press (2000)

    Google Scholar 

  14. Cristianini, N., Kandola, J., Elisseeff, A., Shawe-Taylor, J.: On kernel-target alignment. In: Advances in Neural Information Processing Systems 14, vol. 14, pp. 367–373 (2002)

    Google Scholar 

  15. De Raedt, L.: Logical and Relational Learning. Springer (2008)

    Google Scholar 

  16. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press (1998)

    Google Scholar 

  17. Fletcher, R.: Practical Methods of Optimization, 2nd edn. John Wiley & Sons (1987)

    Google Scholar 

  18. Frasconi, P., Passerini, A.: Learning with Kernels and Logical Representations. In: De Raedt, L., Frasconi, P., Kersting, K., Muggleton, S.H. (eds.) Probabilistic Inductive Logic Programming. LNCS (LNAI), vol. 4911, pp. 56–91. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  19. Gärtner, T.: Exponential and geometric kernels for graphs. In: NIPS Workshop on Unreal Data: Principles of Modeling Nonvectorial Data (2002)

    Google Scholar 

  20. Gärtner, T., Flach, P., Kowalczyk, A., Smola, A.J.: Multi-instance kernels. In: Sammut, C., Hoffmann, A. (eds.) Proceedings of the 19th International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann (2002)

    Google Scholar 

  21. Gärtner, T., Lloyd, J.W., Flach, P.A.: Kernels for Structured Data. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 66–83. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  22. Gärtner, T.: Kernels for Structured Data. PhD thesis, Universität Bonn (2005)

    Google Scholar 

  23. Gärtner, T., Flach, P.A., Wrobel, S.: On Graph Kernels: Hardness Results and Efficient Alternatives. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 129–143. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  24. Gärtner, T., Lloyd, J.W., Flach, P.A.: Kernels and distances for structured data. Mach. Learn. 57, 205–232 (2004)

    Article  MATH  Google Scholar 

  25. Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.J.: A kernel method for the two-sample-problem. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, pp. 513–520. MIT Press, Cambridge (2007)

    Google Scholar 

  26. Gretton, A., Fukumizu, K., Teo, C.H., Song, L., Schölkopf, B., Smola, A.J.: A kernel statistical test of independence. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems 20. MIT Press, Cambridge (2008)

    Google Scholar 

  27. Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press (1997)

    Google Scholar 

  28. Ham, J., Lee, D.D., Mika, S., Schölkopf, B.: A kernel view of the dimensionality reduction of manifolds. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004, p. 47. ACM, New York (2004)

    Chapter  Google Scholar 

  29. Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, University of California, Santa Cruz (1999)

    Google Scholar 

  30. Hoffmann, H.: Kernel pca for novelty detection. Pattern Recogn. 40, 863–874 (2007)

    Article  MATH  Google Scholar 

  31. Hofmann, T., Schölkopf, B., Smola, A.J.: Kernel methods in machine learning. Annals of Statistics 36(3), 1171–1220 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  32. Horváth, T., Gärtner, T., Wrobel, S.: Cyclic pattern kernels for predictive graph mining. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 158–167. ACM, New York (2004)

    Chapter  Google Scholar 

  33. Jaakkola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. Journal of Computational Biology 7(1-2), 95–114 (2000)

    Article  Google Scholar 

  34. Jaakkola, T., Haussler, D.: Probabilistic kernel regression models. In: Proc. of Neural Information Processing Conference (1998)

    Google Scholar 

  35. Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems II, pp. 487–493. MIT Press, Cambridge (1999)

    Google Scholar 

  36. Jebara, T., Kondor, R., Howard, A.: Probability product kernels. J. Mach. Learn. Res. 5, 819–844 (2004)

    MathSciNet  MATH  Google Scholar 

  37. Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods – Support Vector Learning, ch. 11, pp. 169–185. MIT Press (1998)

    Google Scholar 

  38. Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 321–328. AAAI Press (2003)

    Google Scholar 

  39. Keerthi, S.S., Duan, K.B., Shevade, S.K., Poo, A.N.: A fast dual algorithm for kernel logistic regression. Mach. Learn. 61, 151–165 (2005)

    Article  MATH  Google Scholar 

  40. Kim, K., Franz, M.O., Schölkopf, B.: Iterative kernel principal component analysis for image modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(9), 1351–1366 (2005)

    Article  Google Scholar 

  41. Kimeldorf, G., Wahba, G.: Some results on tchebycheffian spline functions. J. Math. Anal. Applic. 33, 82–95 (1971)

    Article  MathSciNet  MATH  Google Scholar 

  42. Kondor, R.I., Lafferty, J.: Diffusion kernels on graphs and other discrete input spaces. In: Sammut, C., Hoffmann, A. (eds.) Proc. of the 19th International Conference on Machine Learning, pp. 315–322. Morgan Kaufmann (2002)

    Google Scholar 

  43. Lanckriet, G.R.G., Cristianini, N., Bartlett, P., El Ghaoui, L., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5, 27–72 (2004)

    MATH  Google Scholar 

  44. Landwehr, N., Passerini, A., De Raedt, L., Frasconi, P.: kfoil: learning simple relational kernels. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 1, pp. 389–394. AAAI Press (2006)

    Google Scholar 

  45. Landwehr, N., Passerini, A., Raedt, L., Frasconi, P.: Fast learning of relational kernels. Mach. Learn. 78, 305–342 (2010)

    Article  Google Scholar 

  46. Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for svm protein classification. In: Proc. of the Pacific Symposium on Biocomputing, pp. 564–575 (2002)

    Google Scholar 

  47. Leslie, C., Eskin, E., Weston, J., Noble, W.S.: Mismatch string kernels for svm protein classification. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 1417–1424. MIT Press, Cambridge (2003)

    Google Scholar 

  48. Leslie, C., Kuang, R., Eskin, E.: Inexact matching string kernels for protein classificatio. In: Schölkopf, B., Tsuda, K., Vert, J.-P. (eds.) Kernel Methods in Computational Biology, MIT Press (2004) (in press)

    Google Scholar 

  49. Lodhi, H., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. In: Advances in Neural Information Processing Systems, pp. 563–569 (2000)

    Google Scholar 

  50. Da San Martino, G.: Kernel Methods for Tree Structured Data. PhD thesis, Department of Computer Science, University of Bologna (2009)

    Google Scholar 

  51. Menchetti, S.: Learning Preference and Structured Data: Theory and Applications. PhD thesis, Dipartimento di Sistemi e Informatica, DSI, Università di Firenze, Italy (December 2005)

    Google Scholar 

  52. Menchetti, S., Costa, F., Frasconi, P.: Weighted decomposition kernels. In: Proceedings of the 22nd International Conference on Machine Learning, ICML 2005, pp. 585–592. ACM, New York (2005)

    Google Scholar 

  53. Mercer, J.: Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. London A 209, 415–446 (1909)

    Google Scholar 

  54. Micchelli, C.A., Xu, Y., Zhang, H.: Universal kernels. J. Mach. Learn. Res. 7, 2651–2667 (2006)

    MathSciNet  MATH  Google Scholar 

  55. Moschitti, A.: Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  56. Muggleton, S., De Raedt, L.: Inductive logic programming: Theory and methods. Journal of Logic Programming 19/20, 629–679 (1994)

    Article  MathSciNet  Google Scholar 

  57. Passerini, A., Frasconi, P., De Raedt, L.: Kernels on prolog proof trees: Statistical learning in the ILP setting. Journal of Machine Learning Research 7, 307–342 (2006)

    MathSciNet  MATH  Google Scholar 

  58. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Burges, C., Schölkopf, B. (eds.) Advances in Kernel Methods–Support Vector Learning. MIT Press (1998)

    Google Scholar 

  59. Poggio, T., Smale, S.: The mathematics of learning: Dealing with data. Notices of the American Mathematical Society 50(5), 537–544 (2003)

    MathSciNet  MATH  Google Scholar 

  60. Quinlan, J.R.: Learning Logical Definitions from Relations. Machine Learning 5, 239–266 (1990)

    Google Scholar 

  61. Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  62. Rakotomamonjy, A., Bach, F.R., Canu, S., Grandvalet, Y.: SimpleMKL. Journal of Machine Learning Research 9, 2491–2521 (2008)

    MathSciNet  MATH  Google Scholar 

  63. Rasmussenand, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press (December 2005)

    Google Scholar 

  64. Saitoh, S.: Theory of Reproducing Kernels and its Applications. Longman Scientific Technical, Harlow (1988)

    MATH  Google Scholar 

  65. Saunders, G., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proc. 15th International Conf. on Machine Learning, pp. 515–521 (1998)

    Google Scholar 

  66. Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high dimensional distribution. Neural Computation 13, 1443–1471 (2001)

    Article  MATH  Google Scholar 

  67. Schölkopf, B., Smola, A., Müller, K.R.: Kernel principal component analysis. In: Advances in Kernel Methods–Support Vector Learning, pp. 327–352. MIT Press (1999)

    Google Scholar 

  68. Schölkopf, B., Smola, A.J.: Learning with Kernels. The MIT Press, Cambridge (2002)

    Google Scholar 

  69. Schölkopf, B., Warmuth, M.K. (eds.): COLT/Kernel 2003. LNCS (LNAI), vol. 2777. Springer, Heidelberg (2003)

    MATH  Google Scholar 

  70. Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Comput. 12, 1207–1245 (2000)

    Article  Google Scholar 

  71. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)

    Book  Google Scholar 

  72. Shin, K., Kuboyama, T.: A generalization of haussler’s convolution kernel: mapping kernel. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 944–951. ACM, New York (2008)

    Chapter  Google Scholar 

  73. Sterling, L., Shapiro, E.: The art of Prolog: advanced programming techniques, 2nd edn. MIT Press, Cambridge (1994)

    MATH  Google Scholar 

  74. Swamidass, S.J., Chen, J., Bruand, J., Phung, P., Ralaivola, L., Baldi, P.: Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics 21, 359–368 (2005)

    Article  Google Scholar 

  75. Tax, D.M.J., Duin, R.P.W.: Support vector domain description. Pattern Recognition Letters 20, 1991–(1999)

    Article  Google Scholar 

  76. Tikhonov, A.N.: On solving ill-posed problem and method of regularization. Dokl. Akad. Nauk USSR 153, 501–504 (1963)

    Google Scholar 

  77. Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. JMLR 6, 1453–1484 (2005)

    MathSciNet  MATH  Google Scholar 

  78. Tsuda, K., Kin, T., Asai, K.: Marginalized kernels for biological sequences. Bioinformatics 18(suppl. 1), S268–S275 (2002)

    Article  Google Scholar 

  79. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  80. Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  81. Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.: Graph kernels. Journal of Machine Learning Research 11, 1201–1242 (2010)

    MathSciNet  MATH  Google Scholar 

  82. Vishwanathan, S.V.N., Smola, A.: Fast Kernels for String and Tree Matching. Advances in Neural Information Processing Systems 15 (2003)

    Google Scholar 

  83. Wahba, G.: Splines Models for Observational Data. Series in Applied Mathematics, vol. 59. SIAM, Philadelphia (1990)

    Book  Google Scholar 

  84. Wale, N., Watson, I.A., Karypis, G.: Comparison of descriptor spaces for chemical compound retrieval and classification. Knowl. Inf. Syst. 14, 347–375 (2008)

    Article  Google Scholar 

  85. Watkins, C.: Dynamic alignment kernels. In: Smola, A.J., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classiers, pp. 39–50. MIT Press (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Passerini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Passerini, A. (2013). Kernel Methods for Structured Data. In: Bianchini, M., Maggini, M., Jain, L. (eds) Handbook on Neural Information Processing. Intelligent Systems Reference Library, vol 49. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36657-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36657-4_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36656-7

  • Online ISBN: 978-3-642-36657-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics