Using Low-Cost Annotation to Train a Reliable Czech Shallow Parser

Radziszewski, Adam; Grác, Marek

doi:10.1007/978-3-642-40585-3_72

Adam Radziszewski²⁰ &
Marek Grác²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

2474 Accesses

Abstract

Bushbank is a relatively new concept — a type of annotated corpus where annotation is driven by use of automatic tools and the task of human annotators is limited to accepting or rejecting parts of their output. This creates a possibility to obtain annotated corpora of considerable size at relatively low cost.

In this paper we ask the question if the Czech Bushbank is reliable enough to be used for a NLP task instead of a traditional corpus with high annotation rigour. We perform evaluation of three different parsers using its shallow syntactic annotation, including a CRF chunker made originally for Polish. The results are very promising, showing that many practical applications could benefit from low-cost annotation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Shallow Parsing Model for Hindi Using Conditional Random Field

The Construction of a Chinese Semantic Dependency Graph Bank

Construction Grammar Based Annotation Framework for Parsing Tamil

References

Böhmová, A., Hajič, J., Hajičová, E., Hladká, B.: The prague dependency treebank. In: Treebanks, pp. 103–127. Springer (2003)
Google Scholar
Hajič, J., Panevová, J., Buráňová, E., Urešová, Z., Bémová, A., Štěpánek, J., Pajas, P., Kárník, J.: Anotace na analytické rovině. návod pro anotátory (2004)
Google Scholar
Shen, H.: Voting between multiple data representations for text chunking. Master’s thesis, Simon Fraser University, Canada (2004)
Google Scholar
Radziszewski, A., Maziarz, M., Wieczorek, J.: Shallow syntactic annotation in the Corpus of Wroclaw University of Technology. Cognitive Studies 12 (2012)
Google Scholar
Kordoni, V., Zhang, Y.: Annotating Wall Street Journal texts using a hand-crafted deep linguistic grammar. In: Proceedings of the Third Linguistic Annotation Workshop, ACL-IJCNLP 2009, pp. 170–173. Association for Computational Linguistics, Stroudsburg (2009)
Chapter Google Scholar
Waszczuk, J., Glowińska, K., Savary, A., Przepiówski, A.: Tools and methodologies for annotating syntax and named entities in the National Corpus of Polish. In: Proceedings of the International Multiconference on Computer Science and Information Technology (IMCSIT 2010): Computational Linguistics – Applications (CLA 2010), pp. 531–539. PTI, Wisla (2010)
Google Scholar
Grac, M.: Case study of bushbank concept. In: Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation, pp. 353–361. Institute of Digital Enhancement of Cognitive Processing, Waseda University, Singapore (2011)
Google Scholar
Collins, M., Ramshaw, L., Hajič, J., Tillmann, C.: A statistical parser for Czech. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 505–512. Association for Computational Linguistics (1999)
Google Scholar
Kovář, V., Horák, A., Jakubíček, M.: Syntactic analysis using finite patterns: A new parsing system for Czech. In: Vetulani, Z. (ed.) LTC 2009. LNCS, vol. 6562, pp. 161–171. Springer, Heidelberg (2011)
Chapter Google Scholar
Radziszewski, A., Pawlaczek, A.: Large-scale experiments with NP chunking of Polish. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 143–149. Springer, Heidelberg (2012)
Chapter Google Scholar
Šmerk, P.: K morfologické desambiguaci češtiny (2008)
Google Scholar
Grác, M., Jakubíček, M., Kovář, V.: Through low-cost annotation to reliable parsing evaluation. In: Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, pp. 555–562. Waseda University, Tokio (2010)
Google Scholar
Radziszewski, A., Wardyński, A., Śniatowski, T.: WCCL: A morpho-syntactic feature toolkit. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 434–441. Springer, Heidelberg (2011)
Chapter Google Scholar
Grishman, R., Macleod, C., Sterling, J.: Evaluating parsing strategies using standardized parse files. In: Proceedings of the 3rd ACL Conference on Applied Natural Language Processing, pp. 156–161 (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Informatics, Wrocław University of Technology, Wrocław, Poland
Adam Radziszewski
Computational Linguistics Centre, Department of Czech Language, Faculty of Arts, Masaryk University, Brno, Czech Republic
Marek Grác

Authors

Adam Radziszewski
View author publications
You can also search for this author in PubMed Google Scholar
Marek Grác
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of West Bohemia, 306 14, Pilsen, Czech Republic
Ivan Habernal & Václav Matoušek &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Radziszewski, A., Grác, M. (2013). Using Low-Cost Annotation to Train a Reliable Czech Shallow Parser. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_72

Download citation

DOI: https://doi.org/10.1007/978-3-642-40585-3_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics