Skip to main content

Guessers for Finite-State Transducer Lexicons

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5449))

Abstract

Language software applications encounter new words, e.g., acronyms, technical terminology, names or compounds of such words. In order to add new words to a lexicon, we need to indicate their inflectional paradigm. We present a new generally applicable method for creating an entry generator, i.e. a paradigm guesser, for finite-state transducer lexicons. As a guesser tends to produce numerous suggestions, it is important that the correct suggestions be among the first few candidates. We prove some formal properties of the method and evaluate it on Finnish, English and Swedish full-scale transducer lexicons. We use the open-source Helsinki Finite-State Technology [1] to create finite-state transducer lexicons from existing lexical resources and automatically derive guessers for unknown words. The method has a recall of 82-87 % and a precision of 71-76 % for the three test languages. The model needs no external corpus and can therefore serve as a baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. HFST–Helsinki Finite-State Technology, http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/index.shtml

  2. Mikheev, A.: Unsupervised Learning of Word-Category Guessing Rules. In: Proc. of the 34th Annual Meeting of the Association for Computational Linguistics (ACL 1996), pp. 327–334 (1996)

    Google Scholar 

  3. Oflazer, K., Nirenburg, S., McShane, M.: Bootstrapping Morphological Analyzers by Combining Human Elicitation and Machine Learning. Comp. Ling. 27(1), 59–85 (2001)

    Article  Google Scholar 

  4. Wicentowski, R.: Modeling and Learning Multilingual Inflectional Morphology in a Minimally Supervised Framework. PhD Thesis, Baltimore, USA (2002)

    Google Scholar 

  5. Goldsmith, J.A.: Morphological Analogy: Only a Beginning (2007), http://hum.uchicago.edu/~jagoldsm/Papers/analogy.pdf

  6. Creutz, M., Hirsimäki, T., Kurimo, M., Puurula, A., Pylkkönen, J., Siivola, V., Varjokallio, M., Arisoy, E., Saraçlar, M., Stolcke, A.: Morph-based speech recognition and modeling of out-of-vocabulary words across languages. ACM Trans. on Speech and Lang. Proc. 5(1), art. 3 (2007)

    Google Scholar 

  7. Kurimo, M., Creutz, M., Turunen, V.: Overview of Morpho Challenge in CLEF 2007. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 19–21. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  8. Lindén, K.: A probabilistic model for guessing base forms of new words by analogy. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 106–116. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  9. Kuenning, G.: Dictionaries for International Ispell (2007), http://www.lasr.cs.ucla.edu/geoff/ispell-dictionaries.html

  10. Lingsoft, Inc.: Demos, http://www.lingsoft.fi/?doc_id=107&lang=en

  11. Koskenniemi, K.: Two-Level Morphology: A General Computational Model for Word-Form Recognition and Production. Department of General Linguistics, University of Helsinki, Publication No. 11 (1983)

    Google Scholar 

  12. Karlsson, F.: SWETWOL: A Comprehensive Morphological Analyser for Swedish. Nordic Journal of Linguistics 15(1), 1–45 (1992)

    Article  Google Scholar 

  13. Nykysuomen sanalista, http://kaino.kotus.fi/sanat/nykysuomi/

  14. FreeLing 2.1–An Open Source Suite of Language Analyzers, http://garraf.epsevg.upc.es/freeling/

  15. Westerberg, T.: Den stora svenska ordlistan (2008), http://www.dsso.se/

  16. Mikheev, A.: Automatic Rule Induction for Unknown-Word Guessing. Comp. Ling. 23(3), 405–423 (1997)

    Google Scholar 

  17. Stroppa, N., Yvon, F.: An Analogical Learner for Morphological Analysis. In: Proc. of the 9th Conference on Computational Natural Language Learning (CoNLL), pp. 120–127 (2005)

    Google Scholar 

  18. Wicentowski, R.: Multilingual Noise-Robust Supervised Morphological Analysis using the WordFrame Model. In: Proc. of the Seventh Meeting of the ACL Special Interest Group in Computational Phonology, ACL, pp. 70–77 (2004)

    Google Scholar 

  19. Claveau, V., L’Homme, M.C.: Structuring Terminology using Analogy-Based Machine Learning. In: Proceedings of the 7th International Conference on Terminology and Knowledge Engineering, TKE 2005, pp. 17–18 (2005)

    Google Scholar 

  20. Baldwin, T.: Bootstrapping Deep Lexical Resources: Resources for Courses. In: Proc. of the ACL-SIGLEX Workshop on Deep Lexical Acquisition, ACL, pp. 67–76 (2005)

    Google Scholar 

  21. Daelemans, W., Zavrel, J., Sloot, K., Bosch, A.: TiMBL: Tilburg Memory-Based Learner, version 6.0, Reference Guide’, Technical Report–ILK07-03, Department of Communication and Information Sciences, Tilburg University (2003)

    Google Scholar 

  22. Pirinen, T.: Open Source Morphology for Finnish using Finite-State Methods (in Finnish). Technical Report. Department of Linguistics, University of Helsinki (2008)

    Google Scholar 

  23. Sakarovitch, J.: Éléments de théorie des automates. Vuibert (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lindén, K. (2009). Guessers for Finite-State Transducer Lexicons. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2009. Lecture Notes in Computer Science, vol 5449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00382-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00382-0_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00381-3

  • Online ISBN: 978-3-642-00382-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics