Abstract
In this paper, we discuss the structure based classification of documents based on their logical document structure, i.e., their DOM trees. We describe a method using predefined structural features and also four tree kernels suitable for such structures. We evaluate the methods experimentally on a corpus containing the DOM trees of newspaper articles, and on the well-known SUSANNE corpus. We will demonstrate that, for the two corpora, many text types can be learned based on structural features only.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Mehler, A., Geibel, P., Pustylnikov, O., Herold, S.: Structural classifiers of text types. LDV Forum (to appear, 2007)
Collins, M., Duffy, N.: Convolution kernels for natural language. In: NIPS, pp. 625–632 (2001)
Biber, D.: Dimensions of Register Variation. A Cross-Linguistic Comparison. Cambridge University Press, Cambridge (1995)
Mehler, A.: Hierarchical orderings of textual units. In: COLING 2002. Proc. of the 19th International Conference on Computational Linguistics, pp. 646–652. Morgan Kaufmann, San Francisco (2002)
Köhler, R.: Syntactic Structures: Properties and Interrelations. Journal of Quantitative Linguistics, 46–47 (1999)
Kashima, H., Koyanagi, T.: Kernels for semi-structured data. In: ICML, pp. 291–298 (2002)
Moschitti, A.: Efficient convolution kernels for dependency and constituent syntactic trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006)
Sampson, G.: English for the Computer: The Susanne Corpus and Analytic Scheme: SUSANNE Corpus and Analytic Scheme. Clarendon Press, Oxford (1995)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Geibel, P., Krumnack, U., Pustylnikov, O., Mehler, A., Gust, H., Kühnberger, KU. (2007). Structure-Sensitive Learning of Text Types. In: Orgun, M.A., Thornton, J. (eds) AI 2007: Advances in Artificial Intelligence. AI 2007. Lecture Notes in Computer Science(), vol 4830. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76928-6_68
Download citation
DOI: https://doi.org/10.1007/978-3-540-76928-6_68
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76926-2
Online ISBN: 978-3-540-76928-6
eBook Packages: Computer ScienceComputer Science (R0)