Abstract
A new syntactic formalism for dependency parsing of Croatian and its implementation in the SETimes Dependency Treebank of Croatian – the Setimes.Hr Treebank – is presented. Its new syntactic tagset is targeted towards improving dependency parsing accuracy, with special emphasis on the main syntactic categories such as predicates, subjects and objects. It is compared with two versions of Croatian Dependency Treebank (HOBS): one with explicit encoding of subordinate syntactic conjunctions and one without. Manual annotation quality and dependency parsing accuracy were inspected. An improvement in inter-annotator agreement was observed, as Cohen’s kappa coefficient for label attachment κ(LA) peaked at 0.92, topping the two HOBS instances by 0.036 and 0.081 points. Overall dependency parsing accuracy reached 77.49 in labeled attachment (LAS), 2.99 and 5.78 points over HOBS, using a standard graph-based dependency parser.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agić, Ž.: K-Best Spanning Tree Dependency Parsing With Verb Valency Lexicon Reranking. In: Proceedings of COLING 2012: Posters, COLING 2012 Organizing Committee, pp. 1–12 (2012)
Berović, D., Agić, Ž., Tadić, M.: Croatian Dependency Treebank: Recent Development and Initial Experiments. In: Proceedings of LREC 2012, pp. 1902–1906. ELRA (2012)
Buchholz, S., Marsi, E.: CoNLL-X Shared Task on Multilingual Dependency Parsing. In: Proceedings of CoNLL-X, pp. 149–164. ACL (2006)
Džeroski, S., Erjavec, T., Ledinek, N., Pajas, P., Žabokrtský, Z., Žele, A.: Towards a Slovene Dependency Treebank. In: Proceedings of LREC 2006, pp. 1388–1391. ELRA (2006)
Erjavec, T.: MULTEXT-East: Morphosyntactic Resources for Central and Eastern European Languages. Language Resources and Evaluation 46(1), 131–142 (2012)
Erjavec, T., Fišer, D., Krek, S., Ledinek, N.: The JOS Linguistically Tagged Corpus of Slovene. In: Proceedings of LREC 2010, pp. 1806–1809. ELRA (2010)
Böhmová, A., Hajič, J., Hajičová, E., Hladká, B.: The Prague Dependency Treebank: A Three-Level Annotation Scenario. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora. Springer (2003)
McDonald, R., Lerman, K., Pereira, F.: Multilingual Dependency Parsing With a Two-Stage Discriminative Parser. In: Proceedings of CoNLL-X, pp. 216–220. ACL (2006)
Mille, S., Burga, A., Ferraro, G., Wanner, L.: How Does the Granularity of an Annotation Scheme Influence Dependency Parsing Performance? In: Proceedings of COLING 2012: Posters, COLING 2012 Organizing Committee, pp. 839–852 (2012)
Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 Shared Task on Dependency Parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 915–932. ACL (2007)
Tadić, M.: Building the Croatian Dependency Treebank: The Initial Stages. Suvremena lingvistika 63(1), 85–92 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Agić, Ž., Merkler, D. (2013). Three Syntactic Formalisms for Data-Driven Dependency Parsing of Croatian. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_70
Download citation
DOI: https://doi.org/10.1007/978-3-642-40585-3_70
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)