Skip to main content

Morphosyntactic Tagging of Slovene Using Progol

  • Conference paper
  • First Online:
Inductive Logic Programming (ILP 1999)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1634))

Included in the following conference series:

  • 491 Accesses

Abstract

We consider the task of tagging Slovene words with morphosyntactic descriptions (MSDs). MSDs contain not only part-of-speech information but also attributes such as gender and case. In the case of Slovene there are 2,083 possible MSDs. P-Progol was used to learn morphosyntactic disambiguation rules from annotated data (consisting of 161,314 examples) produced by the MULTEXT-East project. P-Progol produced 1,148 rules taking 36 hours. Using simple grammatical background knowledge, e.g. looking for case disagreement, P-Progol induced 4,094 clauses in eight parallel runs. These rules have proved effective at detecting and explaining incorrect MSD annotations in an independent test set, but have not so far produced a tagger comparable to other existing taggers in terms of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. N. Bel, Nicoletta Calzolari, and Monica Monachini (eds.). Common specifications and notation for lexicon encoding and preliminary proposal for the tagsets. MULTEXT Deliverable D1.6.1B, ILC, Pisa, 1995.

    Google Scholar 

  2. H. Blockeel and L. De Raedt. Top-down induction of first-order logical decision trees. Artificial Intelligence, 101(1–2):285–297, 1999.

    Google Scholar 

  3. Eric Brill. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics, 21(4):543–565, 1995.

    Google Scholar 

  4. Nicoleta Calzolari and John McNaught (eds.). Synopsis and Comparison of Morphosyntactic Phenomena Encoded in Lexicons and Corpora: A Common Proposal and Applications to European Languages. EAGLES Document EAG—CLWG—MORPHSYN/R, ILC, Pisa, 1996.

    Google Scholar 

  5. James Cussens. Part-of-speech tagging using Progol. In Inductive Logic Programming: Proceedings of the 7th International Workshop (ILP-97). LNAI 1297, pages 93–108. Springer, 1997.

    Google Scholar 

  6. D. Cutting, J. Kupiec, J. Pedersen, and P. Sibun. A practical part-of-speech tagger. In Proceedings of the Third Conference on Applied Natural Language Processing, pages 133–140, Trento, Italy, 1992.

    Google Scholar 

  7. Walter Daelemans, Jakub Zavrel, Peter Berck, and Steven Gillis. Mbt: A memory-based part of speech tagger-generator. In Eva Ejerhed and Ido Dagan, editors, Proceedings of the Fourth Workshop on Very Large Corpora, pages 14–27, Copenhagen, 1996.

    Google Scholar 

  8. Ludmila Dimitrova, Tomaž Erjavec, Nancy Ide, Heiki-Jan Kaalep, Vladimír Petkevič, and Dan Tufiş. Multext-East: Parallel and Comparable Corpora and Lexicons for Six Central and Eastern European Languages. In COLING-ACL’ 98, pages 315–319, Montréal, Québec, Canada, 1998.

    Google Scholar 

  9. Sašo Džeroski, Tomaž Erjavec, and Jakub Zavrel. Morphosyntactic tagging of slovene: Evaluating pos taggers and tagsets. Technical Report IJS TechReport DP-8018, Jozef Stefan Institute, 1999.

    Google Scholar 

  10. Tomaž Erjavec and Monica Monachini (eds.). Specifications and notation for lexicon encoding. MULTEXT-East Final Report D1.1F, Jožef Stefan Institute, Ljubl-jana, December 1997. http://nl.ijs.si/ME/CD/docs/mte-d11f/.

  11. Tomaž Erjavec and Nancy Ide. The MULTEXT-East corpus. In Antonio Rubio, Natividad Gallardo, Rosa Castro, and Antonio Tejada, editors, First International Conference on Language Resources and Evaluation, LREC’98, pages 971–974, Granada, 1998. ELRA. URL: http://ceres.ugr.es/ rubio/elra.html.

  12. Nikolaj Lindberg and Martin Eineborg. Learning constraint grammar-style disambiguation rules using inductive logic programming. In Proc. COLING/ACL98, 1998.

    Google Scholar 

  13. Tom Mitchell. Machine Learning. McGraw-Hill, 1997.

    Google Scholar 

  14. Adwait Ratnaparkhi. A maximum entropy part of speech tagger. In Proc. ACL-SIGDAT Conference on Empirical Methods in Natural Language Processing, pages 491–497, Philadelphia, 1996.

    Google Scholar 

  15. Rene Steetskamp. An implementation of a probabilistic tagger. Master’s thesis, TOSCA Research Group, University of Nijmegen, Nijmegen, 1995. 48 p.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cussens, J., Džeroski, S., Erjavec, T. (1999). Morphosyntactic Tagging of Slovene Using Progol. In: Džeroski, S., Flach, P. (eds) Inductive Logic Programming. ILP 1999. Lecture Notes in Computer Science(), vol 1634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48751-4_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-48751-4_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66109-2

  • Online ISBN: 978-3-540-48751-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics