Abstract
We consider the task of tagging Slovene words with morphosyntactic descriptions (MSDs). MSDs contain not only part-of-speech information but also attributes such as gender and case. In the case of Slovene there are 2,083 possible MSDs. P-Progol was used to learn morphosyntactic disambiguation rules from annotated data (consisting of 161,314 examples) produced by the MULTEXT-East project. P-Progol produced 1,148 rules taking 36 hours. Using simple grammatical background knowledge, e.g. looking for case disagreement, P-Progol induced 4,094 clauses in eight parallel runs. These rules have proved effective at detecting and explaining incorrect MSD annotations in an independent test set, but have not so far produced a tagger comparable to other existing taggers in terms of accuracy.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
N. Bel, Nicoletta Calzolari, and Monica Monachini (eds.). Common specifications and notation for lexicon encoding and preliminary proposal for the tagsets. MULTEXT Deliverable D1.6.1B, ILC, Pisa, 1995.
H. Blockeel and L. De Raedt. Top-down induction of first-order logical decision trees. Artificial Intelligence, 101(1–2):285–297, 1999.
Eric Brill. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics, 21(4):543–565, 1995.
Nicoleta Calzolari and John McNaught (eds.). Synopsis and Comparison of Morphosyntactic Phenomena Encoded in Lexicons and Corpora: A Common Proposal and Applications to European Languages. EAGLES Document EAG—CLWG—MORPHSYN/R, ILC, Pisa, 1996.
James Cussens. Part-of-speech tagging using Progol. In Inductive Logic Programming: Proceedings of the 7th International Workshop (ILP-97). LNAI 1297, pages 93–108. Springer, 1997.
D. Cutting, J. Kupiec, J. Pedersen, and P. Sibun. A practical part-of-speech tagger. In Proceedings of the Third Conference on Applied Natural Language Processing, pages 133–140, Trento, Italy, 1992.
Walter Daelemans, Jakub Zavrel, Peter Berck, and Steven Gillis. Mbt: A memory-based part of speech tagger-generator. In Eva Ejerhed and Ido Dagan, editors, Proceedings of the Fourth Workshop on Very Large Corpora, pages 14–27, Copenhagen, 1996.
Ludmila Dimitrova, Tomaž Erjavec, Nancy Ide, Heiki-Jan Kaalep, Vladimír Petkevič, and Dan Tufiş. Multext-East: Parallel and Comparable Corpora and Lexicons for Six Central and Eastern European Languages. In COLING-ACL’ 98, pages 315–319, Montréal, Québec, Canada, 1998.
Sašo Džeroski, Tomaž Erjavec, and Jakub Zavrel. Morphosyntactic tagging of slovene: Evaluating pos taggers and tagsets. Technical Report IJS TechReport DP-8018, Jozef Stefan Institute, 1999.
Tomaž Erjavec and Monica Monachini (eds.). Specifications and notation for lexicon encoding. MULTEXT-East Final Report D1.1F, Jožef Stefan Institute, Ljubl-jana, December 1997. http://nl.ijs.si/ME/CD/docs/mte-d11f/.
Tomaž Erjavec and Nancy Ide. The MULTEXT-East corpus. In Antonio Rubio, Natividad Gallardo, Rosa Castro, and Antonio Tejada, editors, First International Conference on Language Resources and Evaluation, LREC’98, pages 971–974, Granada, 1998. ELRA. URL: http://ceres.ugr.es/ rubio/elra.html.
Nikolaj Lindberg and Martin Eineborg. Learning constraint grammar-style disambiguation rules using inductive logic programming. In Proc. COLING/ACL98, 1998.
Tom Mitchell. Machine Learning. McGraw-Hill, 1997.
Adwait Ratnaparkhi. A maximum entropy part of speech tagger. In Proc. ACL-SIGDAT Conference on Empirical Methods in Natural Language Processing, pages 491–497, Philadelphia, 1996.
Rene Steetskamp. An implementation of a probabilistic tagger. Master’s thesis, TOSCA Research Group, University of Nijmegen, Nijmegen, 1995. 48 p.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cussens, J., Džeroski, S., Erjavec, T. (1999). Morphosyntactic Tagging of Slovene Using Progol. In: Džeroski, S., Flach, P. (eds) Inductive Logic Programming. ILP 1999. Lecture Notes in Computer Science(), vol 1634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48751-4_8
Download citation
DOI: https://doi.org/10.1007/3-540-48751-4_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66109-2
Online ISBN: 978-3-540-48751-7
eBook Packages: Springer Book Archive