Skip to main content

Indexing Text Using the Ziv-Lempel Trie

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2476))

Included in the following conference series:

Abstract

Let a text of u characters over an alphabet of size σ be compressible to n symbols by the LZ78 or LZW algorithm. We show that it is possible to build a data structure based on the Ziv-Lempel trie that takes 4n log2 n(1 + o(1)) bits of space and reports the R occurrences of a pattern of length m in worst case time O(m 2 log(mσ) + (m+R) logn).

Partially supported by Fondecyt Grant 1-020831

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P. Agarwal and J. Erickson. Geometric range searching and its relatives. Contemporary Mathematics, 23: Advances in Discrete and Computational Geometry:1–56, 1999.

    Google Scholar 

  2. A. Apostolico. The myriad virtues of subword trees. In Combinatorial Algorithms on Words, NATO ISI Series, pages 85–96. Springer-Verlag, 1985.

    Google Scholar 

  3. T. Bell, J. Cleary, and I. Witten. Text compression. Prentice Hall, 1990.

    Google Scholar 

  4. B. Chazelle. A functional approach to data structures and its use in multidimensional searching. SIAM Journal on Computing, 17(3):427–462, 1988.

    Article  MATH  MathSciNet  Google Scholar 

  5. P. Ferragina and G. Manzini. Opportunistic data structures with applications. In Proc. 41st IEEE Symp. Foundations of Computer Science (FOCS’00), pages 390–398, 2000.

    Google Scholar 

  6. P. Ferragina and G. Manzini. An experimental study of an opportunistic index. In Proc. 12th ACM Symp. on Discrete Algorithms (SODA’ 01), pages 269–278, 2001.

    Google Scholar 

  7. P. Ferragina and G. Manzini. On compressing and indexing data. Technical Report TR-02-01, Dipartamento di Informatica, Univ. of Pisa, 2002.

    Google Scholar 

  8. R. Grossi and J. S. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In Proc. 32nd ACM Symp. Theory of Computing (STOC’00), pages 397–406, 2000.

    Google Scholar 

  9. G. Jacobson. Space-efficient static trees and graphs. In Proc. 30th IEEE Symp. Foundations of Computer Science (FOCS’89), pages 549–554, 1989.

    Google Scholar 

  10. J. Kärkkäinen. Suffix cactus: a cross between suffix tree and suffix array. In Proc. 6th Ann. Symp. Combinatorial Pattern Matching (CPM’95), LNCS 937, pages 191–204, 1995.

    Google Scholar 

  11. J. K’arkk’ainen. Repetition-based text indexes. PhD thesis, Dept. of Computer Science, University of Helsinki, Finland, 1999.

    Google Scholar 

  12. J. Kärkkäinen and E. Ukkonen. Lempel-Ziv parsing and sublinear-size index structures for string matching. In Proc. 3rd South American Workshop on String Processing (WSP’96), pages 141–155. Carleton University Press, 1996.

    Google Scholar 

  13. J. Kärkkäinen and E. Ukkonen. Sparse suffix trees. In Proc. 2nd Ann. Intl. Conference on Computing and Combinatorics (COCOON’96), LNCS 1090, 1996.

    Google Scholar 

  14. R. Kosaraju and G. Manzini. Compression of low entropy strings with Lempel-Ziv algorithms. SIAM Journal on Computing, 29(3):893–911, 1999.

    Article  MathSciNet  Google Scholar 

  15. S. Kurtz. Reducing the space requirements of suffix trees. Report 98-03, Technische Kakultät, Universität Bielefeld, 1998.

    Google Scholar 

  16. V. Mäkinen. Compact suffix array. In Proc. 11th Ann. Symp. Combinatorial Pattern Matching (CPM’00), LNCS 1848, pages 305–319, 2000.

    Chapter  Google Scholar 

  17. U. Manber and G. Myers. Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, pages 935–948, 1993.

    Google Scholar 

  18. I. Munro. Tables. In Proc. 16th Foundations of Software Technology and Theoretical Computer Science (FSTTCS’96), LNCS 1180, pages 37–42, 1996.

    Google Scholar 

  19. I. Munro and V. Raman. Succint representation of balanced parentheses, static trees and planar graphs. In Proc. 38th IEEE Symp. Foundations of Computer Science (FOCS’97), pages 118–126, 1997.

    Google Scholar 

  20. I. Munro, V. Raman, and S. Rao. Space efficient suffix trees. Journal of Algorithms, pages 205–222, 2001.

    Google Scholar 

  21. G. Navarro, E. Moura, M. Neubert, N. Ziviani, and R. Baeza-Yates. Adding compression to block addressing inverted indexes. Information Retrieval, 3(1):49–77, 2000.

    Article  Google Scholar 

  22. K. Sadakane. Compressed text databases with efficient query algorithms based on the compressed suffix array. In Proc. 11th Intl. Symp. Algorithms and Computation (ISAAC’00), LNCS 1969, pages 410–421, 2000.

    Google Scholar 

  23. T. Welch. A technique for high performance data compression. IEEE Computer Magazine, 17(6):8–19, June 1984.

    Google Scholar 

  24. I. Witten, A. Moffat, and T. Bell. Managing Gigabytes. Morgan Kaufmann Publishers, New York, second edition, 1999.

    Google Scholar 

  25. J. Ziv and A. Lempel. Compression of individual sequences via variable length coding. IEEE Trans. on Information Theory, 24:530–536, 1978.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Navarro, G. (2002). Indexing Text Using the Ziv-Lempel Trie. In: Laender, A.H.F., Oliveira, A.L. (eds) String Processing and Information Retrieval. SPIRE 2002. Lecture Notes in Computer Science, vol 2476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45735-6_28

Download citation

  • DOI: https://doi.org/10.1007/3-540-45735-6_28

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44158-8

  • Online ISBN: 978-3-540-45735-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics