Indexing Text Using the Ziv-Lempel Trie

Navarro, Gonzalo

doi:10.1007/3-540-45735-6_28

Gonzalo Navarro⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2476))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

808 Accesses
4 Citations
1 Altmetric

Abstract

Let a text of u characters over an alphabet of size σ be compressible to n symbols by the LZ78 or LZW algorithm. We show that it is possible to build a data structure based on the Ziv-Lempel trie that takes 4n log₂ n(1 + o(1)) bits of space and reports the R occurrences of a pattern of length m in worst case time O(m ² log(mσ) + (m+R) logn).

Partially supported by Fondecyt Grant 1-020831

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

P. Agarwal and J. Erickson. Geometric range searching and its relatives. Contemporary Mathematics, 23: Advances in Discrete and Computational Geometry:1–56, 1999.
Google Scholar
A. Apostolico. The myriad virtues of subword trees. In Combinatorial Algorithms on Words, NATO ISI Series, pages 85–96. Springer-Verlag, 1985.
Google Scholar
T. Bell, J. Cleary, and I. Witten. Text compression. Prentice Hall, 1990.
Google Scholar
B. Chazelle. A functional approach to data structures and its use in multidimensional searching. SIAM Journal on Computing, 17(3):427–462, 1988.
Article MATH MathSciNet Google Scholar
P. Ferragina and G. Manzini. Opportunistic data structures with applications. In Proc. 41st IEEE Symp. Foundations of Computer Science (FOCS’00), pages 390–398, 2000.
Google Scholar
P. Ferragina and G. Manzini. An experimental study of an opportunistic index. In Proc. 12th ACM Symp. on Discrete Algorithms (SODA’ 01), pages 269–278, 2001.
Google Scholar
P. Ferragina and G. Manzini. On compressing and indexing data. Technical Report TR-02-01, Dipartamento di Informatica, Univ. of Pisa, 2002.
Google Scholar
R. Grossi and J. S. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In Proc. 32nd ACM Symp. Theory of Computing (STOC’00), pages 397–406, 2000.
Google Scholar
G. Jacobson. Space-efficient static trees and graphs. In Proc. 30th IEEE Symp. Foundations of Computer Science (FOCS’89), pages 549–554, 1989.
Google Scholar
J. Kärkkäinen. Suffix cactus: a cross between suffix tree and suffix array. In Proc. 6th Ann. Symp. Combinatorial Pattern Matching (CPM’95), LNCS 937, pages 191–204, 1995.
Google Scholar
J. K’arkk’ainen. Repetition-based text indexes. PhD thesis, Dept. of Computer Science, University of Helsinki, Finland, 1999.
Google Scholar
J. Kärkkäinen and E. Ukkonen. Lempel-Ziv parsing and sublinear-size index structures for string matching. In Proc. 3rd South American Workshop on String Processing (WSP’96), pages 141–155. Carleton University Press, 1996.
Google Scholar
J. Kärkkäinen and E. Ukkonen. Sparse suffix trees. In Proc. 2nd Ann. Intl. Conference on Computing and Combinatorics (COCOON’96), LNCS 1090, 1996.
Google Scholar
R. Kosaraju and G. Manzini. Compression of low entropy strings with Lempel-Ziv algorithms. SIAM Journal on Computing, 29(3):893–911, 1999.
Article MathSciNet Google Scholar
S. Kurtz. Reducing the space requirements of suffix trees. Report 98-03, Technische Kakultät, Universität Bielefeld, 1998.
Google Scholar
V. Mäkinen. Compact suffix array. In Proc. 11th Ann. Symp. Combinatorial Pattern Matching (CPM’00), LNCS 1848, pages 305–319, 2000.
Chapter Google Scholar
U. Manber and G. Myers. Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, pages 935–948, 1993.
Google Scholar
I. Munro. Tables. In Proc. 16th Foundations of Software Technology and Theoretical Computer Science (FSTTCS’96), LNCS 1180, pages 37–42, 1996.
Google Scholar
I. Munro and V. Raman. Succint representation of balanced parentheses, static trees and planar graphs. In Proc. 38th IEEE Symp. Foundations of Computer Science (FOCS’97), pages 118–126, 1997.
Google Scholar
I. Munro, V. Raman, and S. Rao. Space efficient suffix trees. Journal of Algorithms, pages 205–222, 2001.
Google Scholar
G. Navarro, E. Moura, M. Neubert, N. Ziviani, and R. Baeza-Yates. Adding compression to block addressing inverted indexes. Information Retrieval, 3(1):49–77, 2000.
Article Google Scholar
K. Sadakane. Compressed text databases with efficient query algorithms based on the compressed suffix array. In Proc. 11th Intl. Symp. Algorithms and Computation (ISAAC’00), LNCS 1969, pages 410–421, 2000.
Google Scholar
T. Welch. A technique for high performance data compression. IEEE Computer Magazine, 17(6):8–19, June 1984.
Google Scholar
I. Witten, A. Moffat, and T. Bell. Managing Gigabytes. Morgan Kaufmann Publishers, New York, second edition, 1999.
Google Scholar
J. Ziv and A. Lempel. Compression of individual sequences via variable length coding. IEEE Trans. on Information Theory, 24:530–536, 1978.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Univ. of Chile, Blanco Encalada, 2120, Santiago, Chile
Gonzalo Navarro

Authors

Gonzalo Navarro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Ciěncia da Computação, Universidade Federal de Minas Gerais, 31270-901, Belo Horizonte, MG, Brazil
Alberto H. F. Laender
Instituto Superior Técnico, INESC-ID, R. Alves Redol 9, 1000-029, Lisboa, Portugal
Arlindo L. Oliveira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Navarro, G. (2002). Indexing Text Using the Ziv-Lempel Trie. In: Laender, A.H.F., Oliveira, A.L. (eds) String Processing and Information Retrieval. SPIRE 2002. Lecture Notes in Computer Science, vol 2476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45735-6_28

Download citation

DOI: https://doi.org/10.1007/3-540-45735-6_28
Published: 18 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44158-8
Online ISBN: 978-3-540-45735-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics