Abstract
A technique for compact representation of language models in Natural Language Processing is presented. After a brief review of the motivations for a more compact representation of such language models, it is shown how finite-state automata can be used to compactly represent such language models. The technique can be seen as an application and extension of perfect hashing by means of finite-state automata. Preliminary practical experiments indicate that the technique yields considerable and important space savings of up to 90% in practice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Gosse Bouma, Gertjan van Noord, and Robert Malouf. Wide coverage computational analysis of Dutch. 2001. Submitted to volume based on CLIN-2000. Available from http://www.let.rug. nl/~vannoord/.
Michael Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, University Of Pennsylvania, 1999.
Jan Daciuk. Experiments with automata compression. In M. Daley, M. G. Eramian, and S. Yu, editors, Conference on Implementation and Application of Automata CIAA’2000, pages 113–119, London, Ontario, Canada, July 2000. University of Western Ontario.
Jan Daciuk. Finite-state tools for natural language processing. In COLING 2000 Workshop on Using Tools and Architectures to Build NLP Systems, pages 34–37, Luxembourg, August 2000.
George Foster. A maximum entropy/minimum divergence translation model. In K. Vijay-Shanker and Chang-Ning Huang, editors, Proceedings of the 38th Meeting of the Association for Computational Linguistics, pages 37–44, Hong Kong, October 2000.
Frederick Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1998.
Claudio Lucchiesi and Tomasz Kowaltowski. Applications of finite automata representing large vocabularies. Software Practice and Experience, 23(1):15–30, Jan. 1993.
Robert Malouf. The order of prenominal adjectives in natural language generation. In K. Vijay-Shanker and Chang-Ning Huang, editors, Proceedings of the 38th Meeting of the Association for Computational Linguistics, pages 85–92, Hong Kong, October 2000.
Patrick Pantel and Dekang Lin. An unsupervised approach to prepositional phrase attachment using contextually similar words. In K. Vijay-Shanker and Chang-Ning Huang, editors, Proceedings of the 38th Meeting of the Association for Computational Linguistics, pages 101–108, Hong Kong, October 2000.
Dominique Revuz. Dictionnaires et lexiques: méthodes et algorithmes. PhD thesis, Institut Blaise Pascal, Paris, France, 1991. LITP 91.44.
Emmanuel Roche. Finite-state tools for language processing. In ACL’95. Association for Computational Linguistics, 1995. Tutorial.
Robert Endre Tarjan and Andrew Chi-Chih Yao. Storing a sparse table. Communications of the ACM, 22(11):606–611, November 1979.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Daciuk, J., van Noord, G. (2002). Finite Automata for Compact Representation of Language Models in NLP. In: Watson, B.W., Wood, D. (eds) Implementation and Application of Automata. CIAA 2001. Lecture Notes in Computer Science, vol 2494. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36390-4_6
Download citation
DOI: https://doi.org/10.1007/3-540-36390-4_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00400-4
Online ISBN: 978-3-540-36390-3
eBook Packages: Springer Book Archive