Skip to main content

Towards a Constraint Grammar Based Morphological Tagger for Croatian

  • Conference paper
Text, Speech and Dialogue (TSD 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7499))

Included in the following conference series:

  • 1651 Accesses

Abstract

A Constraint Grammar (CG) uses context-dependent hand-crafted rules to disambiguate the possible grammatical readings of words in running text. In this paper we describe the development of a CG-based morphological tagger for Croatian language. Our CG tagger uses a morphological analyzer based on an automatically acquired inflectional lexicon and an elaborate tagset based on MULTEXT-East and the Croatian Verb Valence Lexicon. Currently our grammar has 290 rules, organized into cleanup and mapping rules, disambiguation rules, and heuristic rules. The grammar is implemented in the CG3 formalism and compiled with the vislcg3 open-source compiler. The preliminary tagging performance is P:,96.1%, R:,99.8% for POS tagging and P:,88.2%, R:,98.1% for complete morphosyntactic tagging.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Karlsson, F.: Constraint Grammar: A language-independent system for parsing unrestricted text, vol. 4. Walter de Gruyter (1995)

    Google Scholar 

  2. Bick, E.: Degrees of orality in speech-like corpora: Comparative annotation of chat and e-mail corpora. In: Proc. of the 24th Pacific Asia Conference on Language, Information and Computation, pp. 721–729. Waseda University, Sendai (2010)

    Google Scholar 

  3. Aduriz, I., Arriola, J.M., Artola, X., de Ilarraza, A.D.: et al.: Morphosyntactic disambiguation for Basque based on the constraint grammar formalism. In: Proceedings of RANLP 1997, pp. 282–288 (1997)

    Google Scholar 

  4. Bick, E.: The Parsing System “Palavras”: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus Univ. Press (2000)

    Google Scholar 

  5. Bick, E.: Parsing and evaluating the French Europarl corpus. Méthodes et outils pour lévaluation des analyseurs syntaxiques Journée ATALA, 4–9 (2004)

    Google Scholar 

  6. Bick, E.: A CG & PSG hybrid approach to automatic corpus annotation. In: Proceedings of SProLaC 2003, pp. 1–12 (2003)

    Google Scholar 

  7. Johannessen, J., Hagen, K., Nøklestad, A.: A constraint-based tagger for Norwegian. In: 17th Scandinavian Conference of Linguistics, Odense Working Papers in Language and Communication, vol. 19, pp. 31–47 (2000)

    Google Scholar 

  8. Bick, E.: A constraint grammar parser for Spanish. In: Proceedings of TIL (2006)

    Google Scholar 

  9. Forcada, M., Tyers, F., Ramírez-Sánchez, G.: The Apertium machine translation platform: five years on. In: Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation, pp. 3–10 (2009)

    Google Scholar 

  10. Agić, Ž., Tadić, M.: Evaluating morphosyntactic tagging of Croatian texts. In: Proc. of the 5th Int. Conference on Language Resources and Evaluation (2006)

    Google Scholar 

  11. Agić, Ž., Tadić, M., Dovedan, Z.: Improving part-of-speech tagging accuracy for Croatian by morphological analysis. Informatica 32, 445–451 (2008)

    Google Scholar 

  12. Erjavec, T.: MULTEXT-East version 3: Multilingual morphosyntactic specifications, lexicons and corpora. In: Fourth Int. Conference on Language Resources and Evaluation, LREC, vol. 4, pp. 1535–1538 (2004)

    Google Scholar 

  13. Šnajder, J., Dalbelo Bašić, B., Tadić, M.: Automatic acquisition of inflectional lexica for morphological normalisation. Information Processing & Management 44, 1720–1731 (2008)

    Article  Google Scholar 

  14. Preradović, N., Boras, D., Kišiček, S.: CROVALLEX: Croatian verb valence lexicon. In: Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces, ITI 2009, pp. 533–538 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Peradin, H., Šnajder, J. (2012). Towards a Constraint Grammar Based Morphological Tagger for Croatian. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2012. Lecture Notes in Computer Science(), vol 7499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32790-2_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32790-2_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32789-6

  • Online ISBN: 978-3-642-32790-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics