Skip to main content

Structure-Sensitive Learning of Text Types

  • Conference paper
AI 2007: Advances in Artificial Intelligence (AI 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4830))

Included in the following conference series:

Abstract

In this paper, we discuss the structure based classification of documents based on their logical document structure, i.e., their DOM trees. We describe a method using predefined structural features and also four tree kernels suitable for such structures. We evaluate the methods experimentally on a corpus containing the DOM trees of newspaper articles, and on the well-known SUSANNE corpus. We will demonstrate that, for the two corpora, many text types can be learned based on structural features only.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Mehler, A., Geibel, P., Pustylnikov, O., Herold, S.: Structural classifiers of text types. LDV Forum (to appear, 2007)

    Google Scholar 

  2. Collins, M., Duffy, N.: Convolution kernels for natural language. In: NIPS, pp. 625–632 (2001)

    Google Scholar 

  3. Biber, D.: Dimensions of Register Variation. A Cross-Linguistic Comparison. Cambridge University Press, Cambridge (1995)

    Google Scholar 

  4. Mehler, A.: Hierarchical orderings of textual units. In: COLING 2002. Proc. of the 19th International Conference on Computational Linguistics, pp. 646–652. Morgan Kaufmann, San Francisco (2002)

    Google Scholar 

  5. Köhler, R.: Syntactic Structures: Properties and Interrelations. Journal of Quantitative Linguistics, 46–47 (1999)

    Google Scholar 

  6. Kashima, H., Koyanagi, T.: Kernels for semi-structured data. In: ICML, pp. 291–298 (2002)

    Google Scholar 

  7. Moschitti, A.: Efficient convolution kernels for dependency and constituent syntactic trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  8. Sampson, G.: English for the Computer: The Susanne Corpus and Analytic Scheme: SUSANNE Corpus and Analytic Scheme. Clarendon Press, Oxford (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Mehmet A. Orgun John Thornton

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Geibel, P., Krumnack, U., Pustylnikov, O., Mehler, A., Gust, H., Kühnberger, KU. (2007). Structure-Sensitive Learning of Text Types. In: Orgun, M.A., Thornton, J. (eds) AI 2007: Advances in Artificial Intelligence. AI 2007. Lecture Notes in Computer Science(), vol 4830. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76928-6_68

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76928-6_68

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76926-2

  • Online ISBN: 978-3-540-76928-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics