Abstract
HFST-HelsinkiFinite-StateTechnology (http://hfst.sf.net/) is a framework for compiling and applying linguistic descriptions with finitestatemethods. HFST currently collects some of the most important finite-state tools for creatingmorphologies and spellcheckers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical information. HFST offers a path from language descriptions to efficient language applications. In this article, we focus on aspects of HFST that are new to the end user, i.e. new tools, new features in existing tools, or new language applications, in addition to some revised algorithms that increase performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allauzen, C., Mohri, M.: N-way composition of weighted finite-state transducers. International Journal of Foundations of Computer Science 20, 613–627 (2009)
Allauzen, C., Riley, M., Schalkwyk, J., Skut, W., Mohri, M.: OpenFst: A General and Efficient Weighted Finite-State Transducer Library. In: Holub, J., Žďárek, J. (eds.) CIAA 2007. LNCS, vol. 4783, pp. 11–23. Springer, Heidelberg (2007), http://www.openfst.org
Beesley, K.R., Karttunen, L.: Finite State Morphology. CSLI publications (2003)
Brants, T.: TnT - a statistical part-of-speech tagger. In: Proceedings of the Sixth Applied Natural Language Processing (ANLP 2000), Seattle, WA (2000)
Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: ACL 2000: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 286–293. Association for Computational Linguistics, Morristown (2000)
Çöltekin, Ç.: A freely available morphological analyzer for Turkish. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010 (2010)
Gerdemann, D., van Noord, G.: Transducers from re-write rules with backreferences. In: Proceedings of the EACL Conference, pp. 126–133 (1999)
Halácsy, P., Kornai, A., Oravecz, C.: Hunpos—an open source trigram tagger. In: ACL 2007, Prague, Czech Republic (2007)
Karttunen, L.: Beyond morphology: Pattern matching with FST. In: Mahlow, C., Piotrowski, M. (eds.) SFCM 2011. CCIS, vol. 100, pp. 1–13. Springer, Heidelberg (2011)
Kempe, A., Karttunen, L.: Parallel replacement in finite state calculus. In: The Proceedings of the 16th International Conference on Computational Linguistics, pp. 622–627 (1996)
Lindén, K., Axelson, E., Hardwick, S., Pirinen, T.A., Silfverberg, M.: HFST—Framework for Compiling and Applying Morphologies. In: Mahlow, C., Piotrowski, M. (eds.) SFCM 2011. CCIS, vol. 100, pp. 67–85. Springer, Heidelberg (2011)
Lindén, K., Carlson, L.: Finnwordnet—wordnet på finska via översättning. LexicoNordica 17 (2010)
Lindén, K., Pirinen, T.: Weighting finite-state morphological analyzers using hfst tools. In: FSMNLP 2009 (2009)
Lindén, K., Silfverberg, M., Pirinen, T.: HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers. In: Mahlow, C., Piotrowski, M. (eds.) SFCM 2009. CCIS, vol. 41, pp. 28–47. Springer, Heidelberg (2009)
Mays, E., Damerau, F.J., Mercer, R.L.: Context based spelling correction. Inf. Process. Manage. 27(5), 517–522 (1991)
Norvig, P.: How to write a spelling corrector. Web Page (2010), http://norvig.com/spell-correct.html (visited February 28, 2010)
Oflazer, K.: Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Computational Linguistics 22(1), 73–89 (1996)
Oravecz, C., Dienes, P.: Efficient stochastic part-of-speech tagging for Hungarian. In: Proceedings of the Third International Conference on Language Resources and Evaluation, Las Palmas, pp. 710–717 (2002)
Pirinen, T.: Suomen kielen äärellistilainen automaattinen morfologinen analyysi avoimen lähdekoodin menetelmin. Master’s thesis, Helsingin Yliopisto (2008), http://www.helsinki.fi/~tapirine/gradu/
Pirinen, T.A., Lindén, K.: Creating and weighting hunspell dictionaries as finite-state automata. Investigationes Linguisticae 19 (2010)
Pirinen, T.A., Lindén, K.: Finite-state spell-checking with weighted language and error models. In: Proceedings of the Seventh SaLTMiL Workshop on Creation and Use of Basic Lexical Resources for Less-Resourced Languagages, Valletta, Malta, pp. 13–18 (2010)
Pirinen, T.A., Silfverberg, M., Lindén, K.: Context-sensitive spelling correction. In: Gelbukh, A. (ed.) International Conference on Intelligent Text Processing and Computational Linguistics, New Delhi, India (2012)
Savary, A.: Typographical Nearest-Neighbor Search in a Finite-State Lexicon and Its Application to Spelling Correction. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 251–260. Springer, Heidelberg (2003)
Silfverberg, M., Lindén, K.: Conflict resolution using weighted rules in hfst-twolc. In: Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009, pp. 174–181. Nealt (2009)
Silfverberg, M., Lindén, K.: Hfst runtime format—a compacted transducer format allowing for fast lookup. In: Watson, B., Courie, D., Cleophas, L., Rautenbach, P. (eds.) FSMNLP 2009 (July 13, 2009), http://www.ling.helsinki.fi/~klinden/pubs/fsmnlp2009runtime.pdf
Silfverberg, M., Lindén, K.: Part-of-speech tagging using parallel weighted finite-state transducers. In: Proceedings of the 7th International Conference on NLP, IceTAL 2010 (2010)
Silfverberg, M., Lindén, K.: Combining statistical models for POS tagging using finite-state calculus. In: Proceedings of the 18th Conference on Computational Linguistics, NODALIDA 2011, pp. 183–190 (2011)
Tzoukermann, E., Radev, D.: Using word class for part-of-speech disambiguation. In: Proceedings, Fourth Workshop on Very Large Corpora WVLC 1996, Copenhagen, Denmark (1996)
Yli-Jyrä, A.: Transducers from parallel replace rules and modes with generalized lenient composition. In: Finite-State Methods and Natural Language Processing (2008), http://www.ling.helsinki.fi/users/aylijyra/all/YliJyra-2008b:trafropar:inp.pdf
Zanchetta, E., Baroni, M.: Morph-it! a free corpus-based morphological resource for the Italian language. Corpus Linguistics 1(1) (2005)
Zielinski, A., Simon, C.: Morphisto – an open source morphological analyzer for German. In: Proceeding of the 2009 Conference on Finite-State Methods and Natural Language Processing: Post-Proceedings of the 7th International Workshop FSMNLP 2008, pp. 224–231. IOS Press, Amsterdam (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Lindén, K., Axelson, E., Drobac, S., Hardwick, S., Silfverberg, M., Pirinen, T.A. (2013). Using HFST for Creating Computational Linguistic Applications. In: Przepiórkowski, A., Piasecki, M., Jassem, K., Fuglewicz, P. (eds) Computational Linguistics. Studies in Computational Intelligence, vol 458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34399-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-34399-5_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34398-8
Online ISBN: 978-3-642-34399-5
eBook Packages: EngineeringEngineering (R0)