Abstract
We describe and evaluate heuristics, a collection of algorithmic procedures, which have been developed as a part of a linguistic rule-based tagger, IceTagger, for POS tagging Icelandic text. The purpose of the heuristics is to mark grammatical functions and prepositional phrases, and use this information to force feature agreement where appropriate. The heuristics are run after the application of local rules, i.e. rules which perform initial disambiguation based on a local context. Evaluation shows that the accuracy of two of the heuristics, which guess subjects and objects of verbs, is relatively high when compared to the results of parsing-based systems. Similar heuristics could be used for POS tagging texts in other morphologically complex languages.
The author would like to thank Professor Yorick Wilks for valuable comments and suggestions in the preparation of this paper. Additionally, the Institute of Lexicography at the University of Iceland receives gratitude, for kindly providing access to the IFD corpus used in this research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging. Computational Linguistics 21, 543–565 (1995)
Ratnaparkhi, A.: A Maximum Entropy Part-of-Speech Tagger. In: Proceedings of the Empirical Methods in Natural Language Processing Conference, Philadelphia, PA, USA (1996)
Brants, T.: TnT: A statistical part-of-speech tagger. In: Proceedings of the 6th Conference on Applied natural language processing, Seattle, WA, USA (2000)
Voutilainen, A.: A syntax-based part-of-speech analyzer. In: Proceedings of the 7th Conference on European Chapter of the ACL, Dublin, Ireland (1995)
Loftsson, H.: Tagging Icelandic text: A linguistic rule-based approach. Technical Report CS-06-04, Department of Computer Science, University of Sheffield (2006)
Schmid, H.: Improvements in Part-of-Speech Tagging with an Application to German. In: European Chapter of the ACL SIGDAT workshop, Dublin, Ireland (1995)
Pind, J., Magnússon, F., Briem, S.: The Icelandic Frequency Dictionary. The Institute of Lexicography at the University of Iceland, Reykjavik, Iceland (1991)
Helgadóttir, S.: Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic. In: Holmboe, H. (ed.) Nordisk Sprogteknologi 2004, Museum Tusculanums Forlag (2004)
Þráinsson, H.: Icelandic. In: König, E., Auwera, J. (eds.) The Germanic Languages. Routledge, London (1994)
Brill, E.: A Simple Rule-Based Part of Speech Tagger. In: Proceedings of the 3rd Conference on Applied natural language processing, Trento, Italy (1992)
Samuelsson, C.: Morphological tagging based entirely on Bayesian inference. In: Eklund, R. (ed.) 9th Scandinavian Conference on Computational Linguistics, Stockholm, Sweden (1994)
Chanod, J.P., Tapanainen, P.: Tagging French – comparing a statistical and a constraint-based method. In: Proceedings of the 7th Conference on European Chapter of the ACL Conference, Dublin, Ireland (1995)
Karlsson, F.: Constraint Grammar as a Framework for Parsing Running Text. In: Karlgren, H. (ed.) The 13th International Conference on Computational Linguistics, Helsinki, Finland (1990)
Samuelsson, C., Voutilainen, A.: Comparing a Linguistic and a Stochastic tagger. In: Proceedings of the 8th Conference on European Chapter of the ACL, Madrid, Spain (1997)
Hagen, K., Johannessen, J., Nøklestad, A.: A Constraint-Based Tagger for Norwegian. In: Lindberg, C.E., Lund, S.N. (eds.) 17th Scandinavian Conference on Computational Linguistics. Odense Working Papers in Language and Communication, Odense, Denmark, vol. 19, pp. 31–48 (2000)
Hinrichs, E., Trushkina, J.: Getting a Grip on Morphological Disambiguation. In: Proceedings of KONVENS 2002, 6. Konferenz zur Verarbeitung natürlicher Sprache, Saarbrücken, Germany (2002)
Ngai, G., Florian, R.: Transformation-Based Learning in the Fast Lane. In: Proceedings of the 2nd Conference of the North American Chapter of the ACL, Pittsburgh, PA, USA (2001)
Kouchnir, B.: Knowledge-Poor Grammatical Function Assignment for German. Seminar für Sprachwissenschaft (manuscript, 2004)
Müller, F.H.: Annotating Grammatical Functions in German Using Finite-State Cascades. In: 20th International Conference on Computational Linguistics, Geneva, Switzerland (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Loftsson, H. (2006). Tagging a Morphologically Complex Language Using Heuristics. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds) Advances in Natural Language Processing. FinTAL 2006. Lecture Notes in Computer Science(), vol 4139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816508_64
Download citation
DOI: https://doi.org/10.1007/11816508_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37334-6
Online ISBN: 978-3-540-37336-0
eBook Packages: Computer ScienceComputer Science (R0)