Finite Automata for Compact Representation of Language Models in NLP

Daciuk, Jan; van Noord, Gertjan

doi:10.1007/3-540-36390-4_6

Jan Daciuk⁶ &
Gertjan van Noord⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2494))

Included in the following conference series:

International Conference on Implementation and Application of Automata

287 Accesses

Abstract

A technique for compact representation of language models in Natural Language Processing is presented. After a brief review of the motivations for a more compact representation of such language models, it is shown how finite-state automata can be used to compactly represent such language models. The technique can be seen as an application and extension of perfect hashing by means of finite-state automata. Preliminary practical experiments indicate that the technique yields considerable and important space savings of up to 90% in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Efficient Compression Scheme for Natural Language Text by Hashing

Article 04 June 2022

Solutions of Creating Large Data Resources in Natural Language Processing

Automated ontology generation from a plain text using statistical and NLP techniques

Article 10 December 2015

References

Gosse Bouma, Gertjan van Noord, and Robert Malouf. Wide coverage computational analysis of Dutch. 2001. Submitted to volume based on CLIN-2000. Available from http://www.let.rug. nl/~vannoord/.
Michael Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, University Of Pennsylvania, 1999.
Google Scholar
Jan Daciuk. Experiments with automata compression. In M. Daley, M. G. Eramian, and S. Yu, editors, Conference on Implementation and Application of Automata CIAA’2000, pages 113–119, London, Ontario, Canada, July 2000. University of Western Ontario.
Google Scholar
Jan Daciuk. Finite-state tools for natural language processing. In COLING 2000 Workshop on Using Tools and Architectures to Build NLP Systems, pages 34–37, Luxembourg, August 2000.
Google Scholar
George Foster. A maximum entropy/minimum divergence translation model. In K. Vijay-Shanker and Chang-Ning Huang, editors, Proceedings of the 38th Meeting of the Association for Computational Linguistics, pages 37–44, Hong Kong, October 2000.
Google Scholar
Frederick Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1998.
Google Scholar
Claudio Lucchiesi and Tomasz Kowaltowski. Applications of finite automata representing large vocabularies. Software Practice and Experience, 23(1):15–30, Jan. 1993.
Google Scholar
Robert Malouf. The order of prenominal adjectives in natural language generation. In K. Vijay-Shanker and Chang-Ning Huang, editors, Proceedings of the 38th Meeting of the Association for Computational Linguistics, pages 85–92, Hong Kong, October 2000.
Google Scholar
Patrick Pantel and Dekang Lin. An unsupervised approach to prepositional phrase attachment using contextually similar words. In K. Vijay-Shanker and Chang-Ning Huang, editors, Proceedings of the 38th Meeting of the Association for Computational Linguistics, pages 101–108, Hong Kong, October 2000.
Google Scholar
Dominique Revuz. Dictionnaires et lexiques: méthodes et algorithmes. PhD thesis, Institut Blaise Pascal, Paris, France, 1991. LITP 91.44.
Google Scholar
Emmanuel Roche. Finite-state tools for language processing. In ACL’95. Association for Computational Linguistics, 1995. Tutorial.
Google Scholar
Robert Endre Tarjan and Andrew Chi-Chih Yao. Storing a sparse table. Communications of the ACM, 22(11):606–611, November 1979.
Google Scholar

Download references

Author information

Authors and Affiliations

Alfa Informatica, Rijksuniversiteit Groningen, Oude Kijk in’ t Jatstraat 26, Postbus 716, 9700 AS, Groningen, the Netherlands
Jan Daciuk & Gertjan van Noord

Authors

Jan Daciuk
View author publications
You can also search for this author in PubMed Google Scholar
Gertjan van Noord
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Pretoria, Lynwood Road, Pretoria, 0002, South Africa
Bruce W. Watson
Department of Computer Science, Hong Kong University of Science and Technology, Clearwater Bay, Kowloon, Hong Kong
Derick Wood

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Daciuk, J., van Noord, G. (2002). Finite Automata for Compact Representation of Language Models in NLP. In: Watson, B.W., Wood, D. (eds) Implementation and Application of Automata. CIAA 2001. Lecture Notes in Computer Science, vol 2494. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36390-4_6

Download citation

DOI: https://doi.org/10.1007/3-540-36390-4_6
Published: 18 December 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00400-4
Online ISBN: 978-3-540-36390-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Finite Automata for Compact Representation of Language Models in NLP

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

An Efficient Compression Scheme for Natural Language Text by Hashing

Solutions of Creating Large Data Resources in Natural Language Processing

Automated ontology generation from a plain text using statistical and NLP techniques

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Finite Automata for Compact Representation of Language Models in NLP

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

An Efficient Compression Scheme for Natural Language Text by Hashing

Solutions of Creating Large Data Resources in Natural Language Processing

Automated ontology generation from a plain text using statistical and NLP techniques

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation