Abstract
This paper describes methods and tools used for the post-annotation checking of Prague Dependency Treebank 2.0 data. The annotation process was complicated by many factors: for example, the corpus is divided into several layers that must reflect each other; the annotation rules changed and evolved during the annotation process; some parts of the data were annotated separately and in parallel and had to be merged with the data later. The conversion of the data from the old format to a new one was another source of possible problems besides omnipresent human inadvertence. The checking procedures are classified according to several aspects, e.g. their linguistic relevance and their role in the checking process, and prominent examples are given. In the last part of the paper, the methods are compared and scored.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hajič, J., Mikulová, M., Bémová, A., Hajičová, E., Havelka, J., Kolářová-Řezníčková, V., Kučová, L., Lopatková, M., Pajas, P., Panevová, J., Razímová, M., Sgall, P., Štěpánek, J., Urešová, Z., Veselá, K., Žabokrtský, Z.: The Prague Dependency Treebank 2.0. CD-ROM (in preparation), http://ufal.mff.cuni.cz/pdt2.0/
Sgall, P.: Generativní popis jazyka a česká deklinace [Generative Description of Language and Czech Declension]. Academia, Prague, Czech Rep. (1967)
Pajas, P., Štěpánek, J.: A Generic XML-Based Format for Structured Linguistic Annotation and Its Application to Prague Dependency Treebank 2.0. Technical Report TR-2005-29, ÚFAL MFF UK, Prague, Czech Rep. (2005)
Hajič, J., Vidová-Hladká, B., Pajas, P.: The Prague Dependency Treebank: Annotation Structure and Support. In: Proceedings of the IRCS Workshop on Linguistic Databases, Philadelphia, USA, University of Pennsylvania, pp. 105–114 (2001)
Dickinson, M.: Error detection and correction in annotated corpora. PhD thesis, The Ohio State University (2005)
Květoň, P.: Rule based morphological disambiguation. PhD thesis (in print, 2006)
Sgall, P., Hajičová, E., Panevová, J.: The Meaning of the Sentence in Its Semantic and Pragmatic Aspects. Academia, Prague, Czech Rep. (1986)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Štěpánek, J. (2006). Post-annotation Checking of Prague Dependency Treebank 2.0 Data. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2006. Lecture Notes in Computer Science(), vol 4188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846406_35
Download citation
DOI: https://doi.org/10.1007/11846406_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39090-9
Online ISBN: 978-3-540-39091-6
eBook Packages: Computer ScienceComputer Science (R0)