Skip to main content

Three Syntactic Formalisms for Data-Driven Dependency Parsing of Croatian

  • Conference paper
Text, Speech, and Dialogue (TSD 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Included in the following conference series:

Abstract

A new syntactic formalism for dependency parsing of Croatian and its implementation in the SETimes Dependency Treebank of Croatian – the Setimes.Hr Treebank – is presented. Its new syntactic tagset is targeted towards improving dependency parsing accuracy, with special emphasis on the main syntactic categories such as predicates, subjects and objects. It is compared with two versions of Croatian Dependency Treebank (HOBS): one with explicit encoding of subordinate syntactic conjunctions and one without. Manual annotation quality and dependency parsing accuracy were inspected. An improvement in inter-annotator agreement was observed, as Cohen’s kappa coefficient for label attachment κ(LA) peaked at 0.92, topping the two HOBS instances by 0.036 and 0.081 points. Overall dependency parsing accuracy reached 77.49 in labeled attachment (LAS), 2.99 and 5.78 points over HOBS, using a standard graph-based dependency parser.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agić, Ž.: K-Best Spanning Tree Dependency Parsing With Verb Valency Lexicon Reranking. In: Proceedings of COLING 2012: Posters, COLING 2012 Organizing Committee, pp. 1–12 (2012)

    Google Scholar 

  2. Berović, D., Agić, Ž., Tadić, M.: Croatian Dependency Treebank: Recent Development and Initial Experiments. In: Proceedings of LREC 2012, pp. 1902–1906. ELRA (2012)

    Google Scholar 

  3. Buchholz, S., Marsi, E.: CoNLL-X Shared Task on Multilingual Dependency Parsing. In: Proceedings of CoNLL-X, pp. 149–164. ACL (2006)

    Google Scholar 

  4. Džeroski, S., Erjavec, T., Ledinek, N., Pajas, P., Žabokrtský, Z., Žele, A.: Towards a Slovene Dependency Treebank. In: Proceedings of LREC 2006, pp. 1388–1391. ELRA (2006)

    Google Scholar 

  5. Erjavec, T.: MULTEXT-East: Morphosyntactic Resources for Central and Eastern European Languages. Language Resources and Evaluation 46(1), 131–142 (2012)

    Article  Google Scholar 

  6. Erjavec, T., Fišer, D., Krek, S., Ledinek, N.: The JOS Linguistically Tagged Corpus of Slovene. In: Proceedings of LREC 2010, pp. 1806–1809. ELRA (2010)

    Google Scholar 

  7. Böhmová, A., Hajič, J., Hajičová, E., Hladká, B.: The Prague Dependency Treebank: A Three-Level Annotation Scenario. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora. Springer (2003)

    Google Scholar 

  8. McDonald, R., Lerman, K., Pereira, F.: Multilingual Dependency Parsing With a Two-Stage Discriminative Parser. In: Proceedings of CoNLL-X, pp. 216–220. ACL (2006)

    Google Scholar 

  9. Mille, S., Burga, A., Ferraro, G., Wanner, L.: How Does the Granularity of an Annotation Scheme Influence Dependency Parsing Performance? In: Proceedings of COLING 2012: Posters, COLING 2012 Organizing Committee, pp. 839–852 (2012)

    Google Scholar 

  10. Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 Shared Task on Dependency Parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 915–932. ACL (2007)

    Google Scholar 

  11. Tadić, M.: Building the Croatian Dependency Treebank: The Initial Stages. Suvremena lingvistika 63(1), 85–92 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Agić, Ž., Merkler, D. (2013). Three Syntactic Formalisms for Data-Driven Dependency Parsing of Croatian. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_70

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40585-3_70

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40584-6

  • Online ISBN: 978-3-642-40585-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics