Skip to main content

On Compact Storage Models for Gazetteers

  • Conference paper
Finite-State Methods and Natural Language Processing (FSMNLP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4002))

  • 700 Accesses

Abstract

This paper describes compact storage models for gazetteers using state-of-the-art finite-state technology. In particular, we compare the standard method based on numbered indexing automata associated with an auxiliary storage device, against a pure finite-state representation, the latter being superior in terms of space and time complexity, when applied to real-world test data. Further, we pinpoint some pros and cons for both approaches and provide results of empirical experiments, which form handy guidelines for selecting a suitable data structure for implementing a gazetteer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ciura, M.G., Deorowicz, S.: How to Squeeze a Lexicon. Software - Practice and Experience 31(11), 1077–1090 (2001)

    Article  MATH  Google Scholar 

  2. Daciuk, J.: Incremental Construction of Finite-State Automata and Transducers. PhD Thesis. Technical University Gdańsk (1998)

    Google Scholar 

  3. Kowaltowski, T., Lucchesi, C.L.: Applications of Finite Automata Representing Large Vocabularies. TR DCC-01/92, University of Campinas, Brazil (1992)

    Google Scholar 

  4. Kowaltowski, T., Lucchesi, C.L., Stolfi, J.: Finite Automata and Efficient Lexicon Implementation. TR IC-98-02, University of Campinas, Brazil (1998)

    Google Scholar 

  5. Beijer, N.D., Watson, B.W., Kourie, D.G.: Stretching and Jamming of Automata. In: Proceedings of SAICSIT 2003, Rep. South Africa, pp. 198–207 (2003)

    Google Scholar 

  6. Drożdżyński, W., Krieger, H.U., Piskorski, J., Schäfer, U., Xu, F.: Shallow Processing with Unification and Typed Feature Structures — Foundations and Applications. Künstliche Intelligenz 2004(1), 17–23 (2004)

    Google Scholar 

  7. Daciuk, J., Mihov, S., Watson, B., Watson, R.: Incremental Construction of Minimal Acyclic Finite State Automata. Comp. Rep Linguistics 26(1), 3–16 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  8. Daciuk, J., van Noord, G.: Finite Automata for Compact Representation of Language Models in NLP. Theoretical Computer Science 313(1) (2004)

    Google Scholar 

  9. Graña, J., Barcala, F.M., Alonso, M.A.: Compilation Methods of Minimal Acyclic Automata for Large Dictionaries. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 135–148. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  10. Vo, B., Vo, K.P.: Using Column Dependency to Compress Tables. In: Proceedings of the 2004 IEEE Data Compression Conference, pp. 92–101. IEEE Computer Society Press, Los Alamitos (2004)

    Google Scholar 

  11. Daciuk, J.: Experiments with Automata Compression. In: Yu, S., Păun, A. (eds.) CIAA 2000. LNCS, vol. 2088, pp. 113–119. Springer, Heidelberg (2000)

    Google Scholar 

  12. Mihov, S., Maurel, D.: Direct Construction of Minimal Acyclic Subsequential Transducers. In: Yu, S., Păun, A. (eds.) CIAA 2000. LNCS, vol. 2088, pp. 217–229. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  13. Skut, W.: Incremental Construction of Minimal Acyclic Sequential Transducers from Unsorted Lexical Data. In: Proceedings of COLING 2004, Geneva, Switzerland (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Piskorski, J. (2006). On Compact Storage Models for Gazetteers. In: Yli-Jyrä, A., Karttunen, L., Karhumäki, J. (eds) Finite-State Methods and Natural Language Processing. FSMNLP 2005. Lecture Notes in Computer Science(), vol 4002. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780885_22

Download citation

  • DOI: https://doi.org/10.1007/11780885_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35467-3

  • Online ISBN: 978-3-540-35469-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics