Skip to main content

Finite Automata for Compact Representation of Language Models in NLP

  • Conference paper
  • First Online:
Implementation and Application of Automata (CIAA 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2494))

Included in the following conference series:

  • 287 Accesses

Abstract

A technique for compact representation of language models in Natural Language Processing is presented. After a brief review of the motivations for a more compact representation of such language models, it is shown how finite-state automata can be used to compactly represent such language models. The technique can be seen as an application and extension of perfect hashing by means of finite-state automata. Preliminary practical experiments indicate that the technique yields considerable and important space savings of up to 90% in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Gosse Bouma, Gertjan van Noord, and Robert Malouf. Wide coverage computational analysis of Dutch. 2001. Submitted to volume based on CLIN-2000. Available from http://www.let.rug. nl/~vannoord/.

  2. Michael Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, University Of Pennsylvania, 1999.

    Google Scholar 

  3. Jan Daciuk. Experiments with automata compression. In M. Daley, M. G. Eramian, and S. Yu, editors, Conference on Implementation and Application of Automata CIAA’2000, pages 113–119, London, Ontario, Canada, July 2000. University of Western Ontario.

    Google Scholar 

  4. Jan Daciuk. Finite-state tools for natural language processing. In COLING 2000 Workshop on Using Tools and Architectures to Build NLP Systems, pages 34–37, Luxembourg, August 2000.

    Google Scholar 

  5. George Foster. A maximum entropy/minimum divergence translation model. In K. Vijay-Shanker and Chang-Ning Huang, editors, Proceedings of the 38th Meeting of the Association for Computational Linguistics, pages 37–44, Hong Kong, October 2000.

    Google Scholar 

  6. Frederick Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1998.

    Google Scholar 

  7. Claudio Lucchiesi and Tomasz Kowaltowski. Applications of finite automata representing large vocabularies. Software Practice and Experience, 23(1):15–30, Jan. 1993.

    Google Scholar 

  8. Robert Malouf. The order of prenominal adjectives in natural language generation. In K. Vijay-Shanker and Chang-Ning Huang, editors, Proceedings of the 38th Meeting of the Association for Computational Linguistics, pages 85–92, Hong Kong, October 2000.

    Google Scholar 

  9. Patrick Pantel and Dekang Lin. An unsupervised approach to prepositional phrase attachment using contextually similar words. In K. Vijay-Shanker and Chang-Ning Huang, editors, Proceedings of the 38th Meeting of the Association for Computational Linguistics, pages 101–108, Hong Kong, October 2000.

    Google Scholar 

  10. Dominique Revuz. Dictionnaires et lexiques: méthodes et algorithmes. PhD thesis, Institut Blaise Pascal, Paris, France, 1991. LITP 91.44.

    Google Scholar 

  11. Emmanuel Roche. Finite-state tools for language processing. In ACL’95. Association for Computational Linguistics, 1995. Tutorial.

    Google Scholar 

  12. Robert Endre Tarjan and Andrew Chi-Chih Yao. Storing a sparse table. Communications of the ACM, 22(11):606–611, November 1979.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Daciuk, J., van Noord, G. (2002). Finite Automata for Compact Representation of Language Models in NLP. In: Watson, B.W., Wood, D. (eds) Implementation and Application of Automata. CIAA 2001. Lecture Notes in Computer Science, vol 2494. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36390-4_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-36390-4_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00400-4

  • Online ISBN: 978-3-540-36390-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics