Abstract
In this paper, we discuss tree kernels that can be applied for the classification of XML documents based on their DOM trees. DOM trees are ordered trees, in which every node might be labeled by a vector of attributes including its XML tag and the textual content. We describe four new kernels suitable for this kind of trees: a tree kernel derived from the well-known parse tree kernel, the set tree kernel that allows permutations of children, the string tree kernel being an extension of the so-called partial tree kernel, and the soft tree kernel, which is based on the set tree kernel and takes into a account a “fuzzy” comparison of child positions. We present first results on an artificial data set, a corpus of newspaper articles, for which we want to determine the type (genre) of an article based on its structure alone, and the well-known SUSANNE corpus.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Mehler, A., Gleim, R., Dehmer, M.: Towards structure-sensitive hypertext categorization. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A. (eds.) Proceedings of the 29th Annual Conference of the German Classification Society, Springer, Heidelberg (2005)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Gärtner, T.: A survey of kernels for structured data. SIGKDD Explorations 5(2), 49–58 (2003)
Schoelkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)
Collins, M., Duffy, N.: Convolution kernels for natural language. In: NIPS, pp. 625–632 (2001)
Moschitti, A.: A study on convolution kernels for shallow statistic parsing. In: ACL, pp. 335–342 (2004)
Haussler, D.: Convolution Kernels on Discrete Structure. Technical Report UCSC-CRL-99-10, University of California at Santa Cruz, Santa Cruz, CA, USA (1999)
Kashima, H., Koyanagi, T.: Kernels for semi-structured data. In: ICML, pp. 291–298 (2002)
Moschitti, A.: Efficient convolution kernels for dependency and constituent syntactic trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.J.C.H.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)
Geibel, P., Wysotzki, F.: Learning relational concepts with decision trees. In: Saitta, L. (ed.) Machine Learning: Proceedings of the Thirteenth International Conference, pp. 166–174. Morgan Kaufmann, San Francisco (1996)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines, Software (2001), available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Sampson, G.: English for the Computer: The Susanne Corpus and Analytic Scheme: SUSANNE Corpus and Analytic Scheme. Clarendon Press (1995)
Mehler, A., Geibel, P., Pustylnikov, O., Herold, S.: Structural classifiers of text types. LDV Forum (to appear, 2007)
Geibel, P., Jain, B.J., Wysotzki, F.: Combining recurrent neural networks and support vector machines for structural pattern recognition. Neurocomputing 64, 63–105 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Geibel, P., Gust, H., Kühnberger, KU. (2007). Variants of Tree Kernels for XML Documents. In: Gelbukh, A., Kuri Morales, Á.F. (eds) MICAI 2007: Advances in Artificial Intelligence. MICAI 2007. Lecture Notes in Computer Science(), vol 4827. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76631-5_81
Download citation
DOI: https://doi.org/10.1007/978-3-540-76631-5_81
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76630-8
Online ISBN: 978-3-540-76631-5
eBook Packages: Computer ScienceComputer Science (R0)