Skip to main content

Tagging a Morphologically Complex Language Using Heuristics

  • Conference paper
Advances in Natural Language Processing (FinTAL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4139))

Included in the following conference series:

Abstract

We describe and evaluate heuristics, a collection of algorithmic procedures, which have been developed as a part of a linguistic rule-based tagger, IceTagger, for POS tagging Icelandic text. The purpose of the heuristics is to mark grammatical functions and prepositional phrases, and use this information to force feature agreement where appropriate. The heuristics are run after the application of local rules, i.e. rules which perform initial disambiguation based on a local context. Evaluation shows that the accuracy of two of the heuristics, which guess subjects and objects of verbs, is relatively high when compared to the results of parsing-based systems. Similar heuristics could be used for POS tagging texts in other morphologically complex languages.

The author would like to thank Professor Yorick Wilks for valuable comments and suggestions in the preparation of this paper. Additionally, the Institute of Lexicography at the University of Iceland receives gratitude, for kindly providing access to the IFD corpus used in this research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging. Computational Linguistics 21, 543–565 (1995)

    Google Scholar 

  2. Ratnaparkhi, A.: A Maximum Entropy Part-of-Speech Tagger. In: Proceedings of the Empirical Methods in Natural Language Processing Conference, Philadelphia, PA, USA (1996)

    Google Scholar 

  3. Brants, T.: TnT: A statistical part-of-speech tagger. In: Proceedings of the 6th Conference on Applied natural language processing, Seattle, WA, USA (2000)

    Google Scholar 

  4. Voutilainen, A.: A syntax-based part-of-speech analyzer. In: Proceedings of the 7th Conference on European Chapter of the ACL, Dublin, Ireland (1995)

    Google Scholar 

  5. Loftsson, H.: Tagging Icelandic text: A linguistic rule-based approach. Technical Report CS-06-04, Department of Computer Science, University of Sheffield (2006)

    Google Scholar 

  6. Schmid, H.: Improvements in Part-of-Speech Tagging with an Application to German. In: European Chapter of the ACL SIGDAT workshop, Dublin, Ireland (1995)

    Google Scholar 

  7. Pind, J., Magnússon, F., Briem, S.: The Icelandic Frequency Dictionary. The Institute of Lexicography at the University of Iceland, Reykjavik, Iceland (1991)

    Google Scholar 

  8. Helgadóttir, S.: Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic. In: Holmboe, H. (ed.) Nordisk Sprogteknologi 2004, Museum Tusculanums Forlag (2004)

    Google Scholar 

  9. Þráinsson, H.: Icelandic. In: König, E., Auwera, J. (eds.) The Germanic Languages. Routledge, London (1994)

    Google Scholar 

  10. Brill, E.: A Simple Rule-Based Part of Speech Tagger. In: Proceedings of the 3rd Conference on Applied natural language processing, Trento, Italy (1992)

    Google Scholar 

  11. Samuelsson, C.: Morphological tagging based entirely on Bayesian inference. In: Eklund, R. (ed.) 9th Scandinavian Conference on Computational Linguistics, Stockholm, Sweden (1994)

    Google Scholar 

  12. Chanod, J.P., Tapanainen, P.: Tagging French – comparing a statistical and a constraint-based method. In: Proceedings of the 7th Conference on European Chapter of the ACL Conference, Dublin, Ireland (1995)

    Google Scholar 

  13. Karlsson, F.: Constraint Grammar as a Framework for Parsing Running Text. In: Karlgren, H. (ed.) The 13th International Conference on Computational Linguistics, Helsinki, Finland (1990)

    Google Scholar 

  14. Samuelsson, C., Voutilainen, A.: Comparing a Linguistic and a Stochastic tagger. In: Proceedings of the 8th Conference on European Chapter of the ACL, Madrid, Spain (1997)

    Google Scholar 

  15. Hagen, K., Johannessen, J., Nøklestad, A.: A Constraint-Based Tagger for Norwegian. In: Lindberg, C.E., Lund, S.N. (eds.) 17th Scandinavian Conference on Computational Linguistics. Odense Working Papers in Language and Communication, Odense, Denmark, vol. 19, pp. 31–48 (2000)

    Google Scholar 

  16. Hinrichs, E., Trushkina, J.: Getting a Grip on Morphological Disambiguation. In: Proceedings of KONVENS 2002, 6. Konferenz zur Verarbeitung natürlicher Sprache, Saarbrücken, Germany (2002)

    Google Scholar 

  17. Ngai, G., Florian, R.: Transformation-Based Learning in the Fast Lane. In: Proceedings of the 2nd Conference of the North American Chapter of the ACL, Pittsburgh, PA, USA (2001)

    Google Scholar 

  18. Kouchnir, B.: Knowledge-Poor Grammatical Function Assignment for German. Seminar für Sprachwissenschaft (manuscript, 2004)

    Google Scholar 

  19. Müller, F.H.: Annotating Grammatical Functions in German Using Finite-State Cascades. In: 20th International Conference on Computational Linguistics, Geneva, Switzerland (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Loftsson, H. (2006). Tagging a Morphologically Complex Language Using Heuristics. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds) Advances in Natural Language Processing. FinTAL 2006. Lecture Notes in Computer Science(), vol 4139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816508_64

Download citation

  • DOI: https://doi.org/10.1007/11816508_64

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37334-6

  • Online ISBN: 978-3-540-37336-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics