Skip to main content

Variants of Tree Kernels for XML Documents

  • Conference paper
MICAI 2007: Advances in Artificial Intelligence (MICAI 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4827))

Included in the following conference series:

  • 1038 Accesses

Abstract

In this paper, we discuss tree kernels that can be applied for the classification of XML documents based on their DOM trees. DOM trees are ordered trees, in which every node might be labeled by a vector of attributes including its XML tag and the textual content. We describe four new kernels suitable for this kind of trees: a tree kernel derived from the well-known parse tree kernel, the set tree kernel that allows permutations of children, the string tree kernel being an extension of the so-called partial tree kernel, and the soft tree kernel, which is based on the set tree kernel and takes into a account a “fuzzy” comparison of child positions. We present first results on an artificial data set, a corpus of newspaper articles, for which we want to determine the type (genre) of an article based on its structure alone, and the well-known SUSANNE corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Mehler, A., Gleim, R., Dehmer, M.: Towards structure-sensitive hypertext categorization. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A. (eds.) Proceedings of the 29th Annual Conference of the German Classification Society, Springer, Heidelberg (2005)

    Google Scholar 

  2. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)

    MATH  Google Scholar 

  3. Gärtner, T.: A survey of kernels for structured data. SIGKDD Explorations 5(2), 49–58 (2003)

    Article  Google Scholar 

  4. Schoelkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)

    Google Scholar 

  5. Collins, M., Duffy, N.: Convolution kernels for natural language. In: NIPS, pp. 625–632 (2001)

    Google Scholar 

  6. Moschitti, A.: A study on convolution kernels for shallow statistic parsing. In: ACL, pp. 335–342 (2004)

    Google Scholar 

  7. Haussler, D.: Convolution Kernels on Discrete Structure. Technical Report UCSC-CRL-99-10, University of California at Santa Cruz, Santa Cruz, CA, USA (1999)

    Google Scholar 

  8. Kashima, H., Koyanagi, T.: Kernels for semi-structured data. In: ICML, pp. 291–298 (2002)

    Google Scholar 

  9. Moschitti, A.: Efficient convolution kernels for dependency and constituent syntactic trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.J.C.H.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)

    Article  MATH  Google Scholar 

  11. Geibel, P., Wysotzki, F.: Learning relational concepts with decision trees. In: Saitta, L. (ed.) Machine Learning: Proceedings of the Thirteenth International Conference, pp. 166–174. Morgan Kaufmann, San Francisco (1996)

    Google Scholar 

  12. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines, Software (2001), available at http://www.csie.ntu.edu.tw/cjlin/libsvm

  13. Sampson, G.: English for the Computer: The Susanne Corpus and Analytic Scheme: SUSANNE Corpus and Analytic Scheme. Clarendon Press (1995)

    Google Scholar 

  14. Mehler, A., Geibel, P., Pustylnikov, O., Herold, S.: Structural classifiers of text types. LDV Forum (to appear, 2007)

    Google Scholar 

  15. Geibel, P., Jain, B.J., Wysotzki, F.: Combining recurrent neural networks and support vector machines for structural pattern recognition. Neurocomputing 64, 63–105 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alexander Gelbukh Ángel Fernando Kuri Morales

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Geibel, P., Gust, H., Kühnberger, KU. (2007). Variants of Tree Kernels for XML Documents. In: Gelbukh, A., Kuri Morales, Á.F. (eds) MICAI 2007: Advances in Artificial Intelligence. MICAI 2007. Lecture Notes in Computer Science(), vol 4827. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76631-5_81

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76631-5_81

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76630-8

  • Online ISBN: 978-3-540-76631-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics