Skip to main content

Evaluation of the Syntactic Annotation in EPEC, the Reference Corpus for the Processing of Basque

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2009)

Abstract

The aim of this work is to evaluate the dependency-based annotation of EPEC (the Reference Corpus for the Processing of Basque) by means of an experiment: two annotators have syntactically tagged a sample of the mentioned corpus in order to evaluate the agreement-rate between them and to identify those issues that have to be improved in the syntactic annotation process. In this article we present the quantitative and qualitative results of this evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aduriz, I., Aranzabe, M.J., Arriola, J.M., Atutxa, A., Díaz de Ilarraza, A., Ezeiza, N., Gojenola, K., Oronoz, M., Soroa, A., Urizar, R.: Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing. In: Wilson, A., Rayson, P., Archer, D. (eds.) Corpus Linguistics Around the World, Rodopi, Netherland, pp. 1–15 (2006a)

    Google Scholar 

  2. Landis, J.R., Koch, G.G.: The measurement of Observer Agreement for Categorical Data. Biometrics 33, 159–174 (1977)

    Article  MATH  Google Scholar 

  3. Bengoetxea, K., Gojenola, K.: Desarrollo de un analizador sintáctico estadístico basado en dependencias para el euskera. In: Procesamiento del Lenguaje Natural, SEPLN 2007. Universidad de Sevilla (2007)

    Google Scholar 

  4. Alegria, I.: Euskal morfologiaren tratamendu automatikorako tresnak. Doktoretza-tesia, Euskal Herriko Unibertsitatea (UPV/EHU) (1995)

    Google Scholar 

  5. Aldezabal, I., Ceberio, K., Esparza, I., Estarrona, A., Etxeberria, J., Iruskieta, M., Izagirre, E., Uria, L.: EPEC (Euskararen Prozesamendurako Erreferentzia Corpusa) segmentazio-mailan etiketatzeko eskuliburua, UPV/EHU / LSI / TR 11-2007 (2007a)

    Google Scholar 

  6. Aduriz, I., Díaz de Ilarraza, A.: Morphosyntactic disambiguation and shallow parsing in Computational Processing of Basque. In: Oyharcabal, B. (ed.) Inquiries into the lexicon-syntax relations in Basque. ASJUren gehigarria. Euskal Herriko Unibertsitatea (UPV/EHU), Bilbo (2003)

    Google Scholar 

  7. Aduriz, I., Aranzabe, M.J., Arriola, J.M., Díaz de Ilarraza, A.: Sintaxi partziala. In: Fernández, B., Laka, I. (eds.) Andolin gogoan: Essays in Honour of Professor Eguzkitza, UPV/EHUko Argitarapen Zerbitzua, Bilbo (2006b)

    Google Scholar 

  8. Aldezabal, I., Aranzabe, M.J., Arriola, J.M., Díaz de Ilarraza, A., Estarrona, A., Fernandez, K., Iruskieta, M., Uria, L.: EPEC (Euskararen Prozesamendurako Erreferentzia Corpusa) dependentziekin etiketatzeko eskuliburua. UPV/EHU / LSI / TR 12-2007 (2007b)

    Google Scholar 

  9. Aranzabe, M.J.: Dependentzia-ereduan oinarritutako baliabide sintaktikoak: zuhaitz-bankua eta gramatika konputazionala. Doktoretza-tesia. Euskal Herriko Unibertsitatea, UPV/EHU (2008)

    Google Scholar 

  10. Agirre, E., Aldezabal, I., Estarrona, A., Pociello, E.: A methodology for the joint development of the Basque WordNet and Semcor. In: Dutch SemCor Workshop, Amsterdam (2008)

    Google Scholar 

  11. Pociello, E.: Euskararen ezagutza-base lexikala: Euskal WordNet. Doktoretza-tesia, Euskal Filologia Saila (UPV/EHU). Leioa (2008)

    Google Scholar 

  12. Aduriz, I., Aldezabal, I., Alegria, I., Artola, X., Ezeiza, N., Urizar, R.: EUSLEM: A Lemmatiser / Tagger for Basque. In: Proc. EURALEX 1996, Part 1, pp. 17–26. Góuml;teborg, Sweden (1996)

    Google Scholar 

  13. Karlsson, F., Voutilainen, A., Heikkilä, J., Anttila, A.: Constraint Grammar: A Language-independent System for Parsing Unrestricted Text. Mouton de Gruyter, Berlin (1995)

    Book  Google Scholar 

  14. Tapanainen, P., Voutilainen, A.: Tagging Accurately – Don’t guess if you know. In: Proceedings of the 4th Conference on Applied Natural Language Processing, ANLP 1994 (1994)

    Google Scholar 

  15. Ezeiza, N.: Corpusak ustiatzeko tresna linguistikoak. Euskararen etiketatzaile morfosintaktiko sendo eta malgua. Doktoretza-tesia, Euskal Herriko Unibertsitatea (UPV/EHU) (2003)

    Google Scholar 

  16. Aduriz, I.: EUSMG: morfologiatik sintaxira murriztapen gramatika erabiliz. Euskararen desanbiguazio morfologikoaren tratamendua eta azterketa sintaktikoaren lehen urratsak. Doktoretza-tesia, Euskal Herriko Unibertsitatea (UPV/EHU) (2000)

    Google Scholar 

  17. Abeillé, A.: Treebanks: Building and Using Parsed Corpora. Kluwer Academic Publisher, Dordrecht (2003)

    Book  MATH  Google Scholar 

  18. Sleator, D., Temperley, D.: Parsing English with a link grammar. In: Third International Workshop on Parsing Technologies (1993)

    Google Scholar 

  19. Järvinen, T., Tapanainen, P.: A Dependency Parser for English. Technical Report, nº TR-1, Department of General Linguistics. University of Helsinki (1997)

    Google Scholar 

  20. Bunt, H., Carroll, J., Satta, G.: New Developments in Parsing Technology. Text, speech and language technology, vol. 23. Kluwer Academic Publishers, Dordrecht (2004)

    Book  MATH  Google Scholar 

  21. Montemagni, S., Barsotti, F., Battista, M., Calzolari, N., Corazzari, O., Lenci, A., Zampolli, A., Fanciulli, F., Massetani, M., Raffaelli, R., Basili, R., Pazienza, M., Saracino, D., Zanzotto, F., Mana, N., Pianesi, F., Delmonte, R.: Building the Italian Syntactic-Semantic Treebank. In: Abeillé, A. (ed.) Building and Using Parsed Corpora, pp. 189–210. Kluwer Academic Publisher, The Netherlands (2003)

    Google Scholar 

  22. Skut, W., Krenn, B., Brants, T., Uszkoreit, H.: An Annotation Scheme for Free Word Order Languages. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, Washington, DC, USA, pp. 88–95 (1997)

    Google Scholar 

  23. Järvinen, T., Tapanainen, P.: Towards an Implementable Dependency Grammar. In: Proceedings of the Workshop on Processing of Dependency-Based Grammars, COLING-ACL 1998, Montreal (1998)

    Google Scholar 

  24. Oflazer, K., Zeynep, D., Tür, H., Tür, G.: Design for a Turkish Treebank. In: Proceedings of Workshop on Linguistically Interpreted Corpora. Bergen (1999)

    Google Scholar 

  25. Böhomovà, A., Hajic, J., Hajicova, E., Hladka, B.: The PDT: a 3-level annotation scenario. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora. Kluwer Academic Publishers, Dordrecht (2003)

    Google Scholar 

  26. Carroll, J., Briscoe, T., Sanfilippo, A.: Parser evaluation: a survey and a new proposal. In: Proceedings of the First International Conference on Language Resources and Evaluation, Granada, Spain, pp. 447–454 (1998)

    Google Scholar 

  27. Cohen, J.: A coeffcient of agreement for nominal scales. Educational and Psychological Measurement, 37–46 (1960)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Uria, L., Estarrona, A., Aldezabal, I., Aranzabe, M.J., Díaz de Ilarraza, A., Iruskieta, M. (2009). Evaluation of the Syntactic Annotation in EPEC, the Reference Corpus for the Processing of Basque. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2009. Lecture Notes in Computer Science, vol 5449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00382-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00382-0_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00381-3

  • Online ISBN: 978-3-642-00382-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics