Kernel Methods for Structured Data

Passerini, Andrea

doi:10.1007/978-3-642-36657-4_9

Andrea Passerini⁴

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 49))

4073 Accesses

Abstract

Kernel methods are a class of non-parametric learning techniques relying on kernels. A kernel generalizes dot products to arbitrary domains and can thus be seen as a similarity measure between data points with complex structures. The use of kernels allows to decouple the representation of the data from the specific learning algorithm, provided it can be defined in terms of distance or similarity between instances. Under this unifying formalism a wide range of methods have been developed, dealing with binary and multiclass classification, regression, ranking, clustering and novelty detection to name a few. Recent developments include statistical tests of dependency and alignments between related domains, such as documents written in different languages. Key to the success of any kernel method is the definition of an appropriate kernel for the data at hand. A well-designed kernel should capture the aspects characterizing similar instances while being computationally efficient. Building on the seminal work by D. Haussler on convolution kernels, a vast literature on kernels for structured data has arisen. Kernels have been designed for sequences, trees and graphs, as well as arbitrary relational data represented in first or higher order logic. From the representational viewpoint, this allowed to address one of the main limitations of statistical learning approaches, namely the difficulty to deal with complex domain knowledge. Interesting connections between the complementary fields of statistical and symbolic learning have arisen as one of the consequences. Another interesting connection made possible by kernels is between generative and discriminative learning. Here data are represented with generative models and appropriate kernels are built on top of them to be used in a discriminative setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aiolli, F., Da San Martino, G., Sperduti, A.: Route kernels for trees. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, pp. 17–24. ACM, New York (2009)
Google Scholar
Amari, S.I.: Mathematical foundations of neurocomputing. Proceedings of the IEEE 78(9), 1443–1463 (1990)
Article Google Scholar
Amari, S.I.: Natural gradient works efficiently in learning. Neural Computation 10, 251–276 (1998)
Article Google Scholar
Aronszajn, N.: Theory of reproducing kernels. Trans. Amer. Math. Soc. 686, 337–404 (1950)
Article MathSciNet Google Scholar
Bakir, G.H., Hofmann, T., Schölkopf, B., Smola, A.J., Taskar, B., Vishwanathan, S.V.N.: Predicting Structured Data (Neural Information Processing). The MIT Press (2007)
Google Scholar
Berg, C., Christensen, J.P.R., Ressel, P.: Harmonic Analysis on Semigroups. Springer, New York (1984)
Book MATH Google Scholar
Borgwardt, K.M.: Graph Kernels. PhD thesis, Ludwig-Maximilians-University Munich (2007)
Google Scholar
Borgwardt, K.M., Kriegel, H.-P.: Shortest-path kernels on graphs. In: Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM 2005, pp. 74–81. IEEE Computer Society, Washington, DC (2005)
Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifier. In: Proc. 5th ACM Workshop on Computational Learning Theory, Pittsburgh, PA, pp. 144–152 (July 1992)
Google Scholar
Collins, M., Duffy, N.: Convolution kernels for natural language. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14. MIT Press (2002)
Google Scholar
Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, PA, USA, pp. 263–270 (2002)
Google Scholar
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2002)
MATH Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press (2000)
Google Scholar
Cristianini, N., Kandola, J., Elisseeff, A., Shawe-Taylor, J.: On kernel-target alignment. In: Advances in Neural Information Processing Systems 14, vol. 14, pp. 367–373 (2002)
Google Scholar
De Raedt, L.: Logical and Relational Learning. Springer (2008)
Google Scholar
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press (1998)
Google Scholar
Fletcher, R.: Practical Methods of Optimization, 2nd edn. John Wiley & Sons (1987)
Google Scholar
Frasconi, P., Passerini, A.: Learning with Kernels and Logical Representations. In: De Raedt, L., Frasconi, P., Kersting, K., Muggleton, S.H. (eds.) Probabilistic Inductive Logic Programming. LNCS (LNAI), vol. 4911, pp. 56–91. Springer, Heidelberg (2008)
Chapter Google Scholar
Gärtner, T.: Exponential and geometric kernels for graphs. In: NIPS Workshop on Unreal Data: Principles of Modeling Nonvectorial Data (2002)
Google Scholar
Gärtner, T., Flach, P., Kowalczyk, A., Smola, A.J.: Multi-instance kernels. In: Sammut, C., Hoffmann, A. (eds.) Proceedings of the 19th International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann (2002)
Google Scholar
Gärtner, T., Lloyd, J.W., Flach, P.A.: Kernels for Structured Data. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 66–83. Springer, Heidelberg (2003)
Chapter Google Scholar
Gärtner, T.: Kernels for Structured Data. PhD thesis, Universität Bonn (2005)
Google Scholar
Gärtner, T., Flach, P.A., Wrobel, S.: On Graph Kernels: Hardness Results and Efficient Alternatives. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 129–143. Springer, Heidelberg (2003)
Chapter Google Scholar
Gärtner, T., Lloyd, J.W., Flach, P.A.: Kernels and distances for structured data. Mach. Learn. 57, 205–232 (2004)
Article MATH Google Scholar
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.J.: A kernel method for the two-sample-problem. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, pp. 513–520. MIT Press, Cambridge (2007)
Google Scholar
Gretton, A., Fukumizu, K., Teo, C.H., Song, L., Schölkopf, B., Smola, A.J.: A kernel statistical test of independence. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems 20. MIT Press, Cambridge (2008)
Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press (1997)
Google Scholar
Ham, J., Lee, D.D., Mika, S., Schölkopf, B.: A kernel view of the dimensionality reduction of manifolds. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004, p. 47. ACM, New York (2004)
Chapter Google Scholar
Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, University of California, Santa Cruz (1999)
Google Scholar
Hoffmann, H.: Kernel pca for novelty detection. Pattern Recogn. 40, 863–874 (2007)
Article MATH Google Scholar
Hofmann, T., Schölkopf, B., Smola, A.J.: Kernel methods in machine learning. Annals of Statistics 36(3), 1171–1220 (2008)
Article MathSciNet MATH Google Scholar
Horváth, T., Gärtner, T., Wrobel, S.: Cyclic pattern kernels for predictive graph mining. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 158–167. ACM, New York (2004)
Chapter Google Scholar
Jaakkola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. Journal of Computational Biology 7(1-2), 95–114 (2000)
Article Google Scholar
Jaakkola, T., Haussler, D.: Probabilistic kernel regression models. In: Proc. of Neural Information Processing Conference (1998)
Google Scholar
Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems II, pp. 487–493. MIT Press, Cambridge (1999)
Google Scholar
Jebara, T., Kondor, R., Howard, A.: Probability product kernels. J. Mach. Learn. Res. 5, 819–844 (2004)
MathSciNet MATH Google Scholar
Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods – Support Vector Learning, ch. 11, pp. 169–185. MIT Press (1998)
Google Scholar
Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 321–328. AAAI Press (2003)
Google Scholar
Keerthi, S.S., Duan, K.B., Shevade, S.K., Poo, A.N.: A fast dual algorithm for kernel logistic regression. Mach. Learn. 61, 151–165 (2005)
Article MATH Google Scholar
Kim, K., Franz, M.O., Schölkopf, B.: Iterative kernel principal component analysis for image modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(9), 1351–1366 (2005)
Article Google Scholar
Kimeldorf, G., Wahba, G.: Some results on tchebycheffian spline functions. J. Math. Anal. Applic. 33, 82–95 (1971)
Article MathSciNet MATH Google Scholar
Kondor, R.I., Lafferty, J.: Diffusion kernels on graphs and other discrete input spaces. In: Sammut, C., Hoffmann, A. (eds.) Proc. of the 19th International Conference on Machine Learning, pp. 315–322. Morgan Kaufmann (2002)
Google Scholar
Lanckriet, G.R.G., Cristianini, N., Bartlett, P., El Ghaoui, L., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5, 27–72 (2004)
MATH Google Scholar
Landwehr, N., Passerini, A., De Raedt, L., Frasconi, P.: kfoil: learning simple relational kernels. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 1, pp. 389–394. AAAI Press (2006)
Google Scholar
Landwehr, N., Passerini, A., Raedt, L., Frasconi, P.: Fast learning of relational kernels. Mach. Learn. 78, 305–342 (2010)
Article Google Scholar
Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for svm protein classification. In: Proc. of the Pacific Symposium on Biocomputing, pp. 564–575 (2002)
Google Scholar
Leslie, C., Eskin, E., Weston, J., Noble, W.S.: Mismatch string kernels for svm protein classification. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 1417–1424. MIT Press, Cambridge (2003)
Google Scholar
Leslie, C., Kuang, R., Eskin, E.: Inexact matching string kernels for protein classificatio. In: Schölkopf, B., Tsuda, K., Vert, J.-P. (eds.) Kernel Methods in Computational Biology, MIT Press (2004) (in press)
Google Scholar
Lodhi, H., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. In: Advances in Neural Information Processing Systems, pp. 563–569 (2000)
Google Scholar
Da San Martino, G.: Kernel Methods for Tree Structured Data. PhD thesis, Department of Computer Science, University of Bologna (2009)
Google Scholar
Menchetti, S.: Learning Preference and Structured Data: Theory and Applications. PhD thesis, Dipartimento di Sistemi e Informatica, DSI, Università di Firenze, Italy (December 2005)
Google Scholar
Menchetti, S., Costa, F., Frasconi, P.: Weighted decomposition kernels. In: Proceedings of the 22nd International Conference on Machine Learning, ICML 2005, pp. 585–592. ACM, New York (2005)
Google Scholar
Mercer, J.: Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. London A 209, 415–446 (1909)
Google Scholar
Micchelli, C.A., Xu, Y., Zhang, H.: Universal kernels. J. Mach. Learn. Res. 7, 2651–2667 (2006)
MathSciNet MATH Google Scholar
Moschitti, A.: Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006)
Chapter Google Scholar
Muggleton, S., De Raedt, L.: Inductive logic programming: Theory and methods. Journal of Logic Programming 19/20, 629–679 (1994)
Article MathSciNet Google Scholar
Passerini, A., Frasconi, P., De Raedt, L.: Kernels on prolog proof trees: Statistical learning in the ILP setting. Journal of Machine Learning Research 7, 307–342 (2006)
MathSciNet MATH Google Scholar
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Burges, C., Schölkopf, B. (eds.) Advances in Kernel Methods–Support Vector Learning. MIT Press (1998)
Google Scholar
Poggio, T., Smale, S.: The mathematics of learning: Dealing with data. Notices of the American Mathematical Society 50(5), 537–544 (2003)
MathSciNet MATH Google Scholar
Quinlan, J.R.: Learning Logical Definitions from Relations. Machine Learning 5, 239–266 (1990)
Google Scholar
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Article Google Scholar
Rakotomamonjy, A., Bach, F.R., Canu, S., Grandvalet, Y.: SimpleMKL. Journal of Machine Learning Research 9, 2491–2521 (2008)
MathSciNet MATH Google Scholar
Rasmussenand, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press (December 2005)
Google Scholar
Saitoh, S.: Theory of Reproducing Kernels and its Applications. Longman Scientific Technical, Harlow (1988)
MATH Google Scholar
Saunders, G., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proc. 15th International Conf. on Machine Learning, pp. 515–521 (1998)
Google Scholar
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high dimensional distribution. Neural Computation 13, 1443–1471 (2001)
Article MATH Google Scholar
Schölkopf, B., Smola, A., Müller, K.R.: Kernel principal component analysis. In: Advances in Kernel Methods–Support Vector Learning, pp. 327–352. MIT Press (1999)
Google Scholar
Schölkopf, B., Smola, A.J.: Learning with Kernels. The MIT Press, Cambridge (2002)
Google Scholar
Schölkopf, B., Warmuth, M.K. (eds.): COLT/Kernel 2003. LNCS (LNAI), vol. 2777. Springer, Heidelberg (2003)
MATH Google Scholar
Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Comput. 12, 1207–1245 (2000)
Article Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)
Book Google Scholar
Shin, K., Kuboyama, T.: A generalization of haussler’s convolution kernel: mapping kernel. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 944–951. ACM, New York (2008)
Chapter Google Scholar
Sterling, L., Shapiro, E.: The art of Prolog: advanced programming techniques, 2nd edn. MIT Press, Cambridge (1994)
MATH Google Scholar
Swamidass, S.J., Chen, J., Bruand, J., Phung, P., Ralaivola, L., Baldi, P.: Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics 21, 359–368 (2005)
Article Google Scholar
Tax, D.M.J., Duin, R.P.W.: Support vector domain description. Pattern Recognition Letters 20, 1991–(1999)
Article Google Scholar
Tikhonov, A.N.: On solving ill-posed problem and method of regularization. Dokl. Akad. Nauk USSR 153, 501–504 (1963)
Google Scholar
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. JMLR 6, 1453–1484 (2005)
MathSciNet MATH Google Scholar
Tsuda, K., Kin, T., Asai, K.: Marginalized kernels for biological sequences. Bioinformatics 18(suppl. 1), S268–S275 (2002)
Article Google Scholar
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)
Article MathSciNet MATH Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.: Graph kernels. Journal of Machine Learning Research 11, 1201–1242 (2010)
MathSciNet MATH Google Scholar
Vishwanathan, S.V.N., Smola, A.: Fast Kernels for String and Tree Matching. Advances in Neural Information Processing Systems 15 (2003)
Google Scholar
Wahba, G.: Splines Models for Observational Data. Series in Applied Mathematics, vol. 59. SIAM, Philadelphia (1990)
Book Google Scholar
Wale, N., Watson, I.A., Karypis, G.: Comparison of descriptor spaces for chemical compound retrieval and classification. Knowl. Inf. Syst. 14, 347–375 (2008)
Article Google Scholar
Watkins, C.: Dynamic alignment kernels. In: Smola, A.J., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classiers, pp. 39–50. MIT Press (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Ingegneria e Scienza dell’Informazione, Università degli Studi di Trento, Trento, Italy
Andrea Passerini

Authors

Andrea Passerini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrea Passerini .

Editor information

Editors and Affiliations

, Dipto. Ingegneria dell'Informazione, Università degli Studi di Siena, Via Roma 56, Siena, 53100, Italy
Monica Bianchini
Fac. Ingegneria, Dipto. Ingegneria dell'Informazione, Università Siena, Via Roma 56, Siena, 53100, Italy
Marco Maggini
University of Canberra, School of Electrical and Information, Adjunct Professor, Mawson Lakes Campus, ACT, 2601, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Passerini, A. (2013). Kernel Methods for Structured Data. In: Bianchini, M., Maggini, M., Jain, L. (eds) Handbook on Neural Information Processing. Intelligent Systems Reference Library, vol 49. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36657-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-36657-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36656-7
Online ISBN: 978-3-642-36657-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics