Skip to main content

A Multilingual GRUG Treebank for Underresourced Languages

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7816))

  • 2212 Accesses

Abstract

In this paper, we describe outcomes of an undertaking on building Treebanks for underresourced languages Georgian, Russian, Ukrainian, and German - one of the “major” languages in the NLT world. The monolingual parallel sentences in four languages were syntactically annotated manually using the Synpathy tool. The tagsets follow an adapted version of the German TIGER guidelines with necessary changes relevant for the Georgian, the Russian and the Ukrainian languages grammar formal description. An output of the monolingual syntactic annotation is in the TIGER-XML format. Alignment of monolingual repository into the bilingual Treebanks was done by the Stockholm TreeAligner software. A demo of the GRUG treebank resources will be held during a poster session.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brants, S., Hansen, S.: Developments in the TIGER Annotation Scheme and their Realization in the Corpus. In: Proceedings of the Third Conference on Language Resources and Evaluation (LREC 2002), Las Palmas, pp. 1643–1649 (2002)

    Google Scholar 

  2. (1928); Chikobava, A.: The Problem of the Simple Sentence in Georgian, Tbilisi (1928)

    Google Scholar 

  3. Grimes, S., Li, X., Bies, A., Kulick, S., Ma, X., Strassel, S.: Creating Arabic-English Parallel Word-Aligned Treebank Corpora at LDC. In: Proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora, The 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), Hissar, Bulgaria (2011)

    Google Scholar 

  4. Kapanadze, O., Kapanadze, N., Wanner, L., Klatt, S.: Towards A Semantically Motivated Organization of A Valency Lexicon for Natural Language Processing: A GREG Proposal. In: Proceedings of the EURALEX Conference, Copenhagen (2002)

    Google Scholar 

  5. Kapanadze, O.: Verbal Valency in Multilingual Lexica. In: Workshop Abstracts of the 7th Language Resources and Evaluation Conference, LREC 2010, Valletta, Malta (2010)

    Google Scholar 

  6. Kapanadze, O.: Describing Georgian Morphology with a Finite-State System. In: Yli-Jyrä, A., Kornai, A., Sakarovitch, J., Watson, B. (eds.) FSMNLP 2009. LNCS, vol. 6062, pp. 114–122. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. Kapanadze, O.: Finite State Morphology for the Low-Density Georgian Language. In: FSMNLP 2009 Pre-proceedings of the Eighth International Workshop on Finite-State Methods and Natural Language Processing, Pretoria, South Africa (2009)

    Google Scholar 

  8. Megyesi, B., Dahlqvist, B.: A Turkish-Swedish Parallel Corpus and Tools for its Creation. In: Proceedings of Nordiska Datalingvistdagarna, NoDaL- iDa 2007 (2007)

    Google Scholar 

  9. Megyesi, B., Hein Sågvall, A., Csató, E.A., Johanson, E.: Building a Swedish-Turkish Parallel Corpus. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006 (2006)

    Google Scholar 

  10. Rios, A., Göhring, A., Volk, M.: Quechua-Spanish Parallel Treebank. In: 7th Conference on Treebanks and Linguistic Theories, Groningen (2009)

    Google Scholar 

  11. Samuelsson, Y., Volk, M.: Presentation and Representation of Parallel Treebanks. In: Proceedings of the Treebank-Workshop at Nodalida, Joensuu, Finland (2005)

    Google Scholar 

  12. Samuelsson, Y., Volk, M.: Phrase Alignment in Parallel Treebanks. In: Proceedings of 5th Workshop on Treebanks and Linguistic Theories, Prague, Czech Republic (2006)

    Google Scholar 

  13. Smith, G.: A Brief Introduction to the TIGER Treebank, Version 1. Potsdam Universität (2003)

    Google Scholar 

  14. Synphaty: Syntax Editor – Manual – Nijmegen: Max Planck Institute for Psycholinguistics (2006)

    Google Scholar 

  15. Tiedemann, J., Kotzé, G.: Building a Large Machine-Aligned Parallel Tree- bank. In: Proceedings of the Eighth International Workshop on Treebanks and Linguistic Theories (TLT 2008), pp. 197–208. EDUCatt, Milano (2009)

    Google Scholar 

  16. Tiedemann, J.: Lingua-Align: An Experimental Toolbox for Automatic Tree- to-Tree Alignment. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010), Valetta, Malta (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kapanadze, O., Mishchenko, A. (2013). A Multilingual GRUG Treebank for Underresourced Languages. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37247-6_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37246-9

  • Online ISBN: 978-3-642-37247-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics