Skip to main content

The Szeged Treebank

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3658))

Abstract

The major aim of the Szeged Treebank project was to create a high-quality database of syntactic structures for Hungarian that can serve as a golden standard to further research in linguistics and computational language processing. The treebank currently contains full syntactic parsing of about 82,000 sentences, which is the result of accurate manual annotation. Current paper describes the linguistic theory as well as the actual method used in the annotation process. In addition, the application of the treebank for the training of automated syntactic parsers is also presented.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abeillé, A., Clément, L., Toussenel, F.: Building a Treebank for French. In: Abeillé, A. (ed.) Treebank: Building and Using Parsed Corpora, pp. 165–187. Kluwer Academic Publishers, Dordrecht (2003)

    Google Scholar 

  2. Atalay, N.B., Oflazer, K., Say, B.: The Annotation Process in the Turkish Treebank. In: Proceedings of the EACL 2003 Workshop on Linguistically Interpreted Corpora (LINC), Budapest, Hungary (2003)

    Google Scholar 

  3. Boguslavsky, I., Grigorieva, S., Grigoriev, N., Kreidlin, L., Frid, N.: Dependency treebank for Russian: concepts, tools, types of information. In: Proceedings of COLING-2000, Saarbrücken, Germany (2000)

    Google Scholar 

  4. Bond, F., Sanae, F., Chikara, H., Kaname, K., Shigeko, N., Nichols, E., Akira, O., Takaaki, T., Shigeaki, A.: The hinoki treebank A treebank for text understanding. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 158–167. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  5. Brants, S., Dipper, S., Hansen, S., Lezius, W., Smith, G.: The TIGER Treebank. In: Proceedings of the Workshop on Treebanks and Linguistic Theories (TLT 2002), Sozopol, Bulgaria (2002)

    Google Scholar 

  6. Csendes, D., Csirik, J., Gyimóthy, T.: The Szeged Corpus: A POS tagged and Syntactically Annotated Hungarian Natural Language Corpus. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 41–47. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  7. Hajic, J.: Building a Syntactically Annotated Corpus: The Prague Dependency Treebank in Issues of Valency and Meaning, pp. 106-132, Charles University Press, Prague (1999)

    Google Scholar 

  8. Hócza, A., Iván, S.: Learning and recognizing noun phrases. In: Proceedings of the Hungarian Computational Linguistics Conference (MSZNY 2003), Szeged, Hungary, pp. 72–79 (2003)

    Google Scholar 

  9. Kuba, A., Csirik, J., Hócza, A.: POS tagging of Hungarian with combined statistical and rule-based methods. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 113–120. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. Lesmo, L., Lombardo, V., Bosco, C.: Treebank Development: the TUT Approach. In: Proceedings of ICON 2002, Mumbay, India (2002)

    Google Scholar 

  11. Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus. In: Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools, Cairo, Egypt (2004)

    Google Scholar 

  12. Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19 (1993)

    Google Scholar 

  13. Nivre, J.: What kinds of trees grow in Swedish soil? A comparison of four annotation schemes for Swedish. In: Proceedings of the Workshop on Treebanks and Linguistic Theories (TLT 2002), Sozopol, Bulgaria (2002)

    Google Scholar 

  14. Osenova, P., Simov, K.: BTB-TR05: BulTreeBank Stylebook, BulTreeBank Project Technical Report š 05 (2004)

    Google Scholar 

  15. Simov, K., Simov, A., Kouylekov, M., Ivanova, K., Grigorov, I., Ganev, H.: Development of Corpora within the CLaRK System: The BulTreeBank Project Experience. In: Proceedings of the Demo Sessions of EACL 2003, Budapest, Hungary, pp. 243–246 (2003)

    Google Scholar 

  16. Torruella, M.C., Anton´ın, M.: Design Principles for a Spanish Treebank in Proceedings of The Workshop on Treebanks and Linguistic Theories (TLT2002), Sozopol, Bulgaria (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Csendes, D., Csirik, J., Gyimóthy, T., Kocsor, A. (2005). The Szeged Treebank. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_16

Download citation

  • DOI: https://doi.org/10.1007/11551874_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28789-6

  • Online ISBN: 978-3-540-31817-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics