Abstract
Let a text of u characters over an alphabet of size σ be compressible to n symbols by the LZ78 or LZW algorithm. We show that it is possible to build a data structure based on the Ziv-Lempel trie that takes 4n log2 n(1 + o(1)) bits of space and reports the R occurrences of a pattern of length m in worst case time O(m 2 log(mσ) + (m+R) logn).
Partially supported by Fondecyt Grant 1-020831
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
P. Agarwal and J. Erickson. Geometric range searching and its relatives. Contemporary Mathematics, 23: Advances in Discrete and Computational Geometry:1–56, 1999.
A. Apostolico. The myriad virtues of subword trees. In Combinatorial Algorithms on Words, NATO ISI Series, pages 85–96. Springer-Verlag, 1985.
T. Bell, J. Cleary, and I. Witten. Text compression. Prentice Hall, 1990.
B. Chazelle. A functional approach to data structures and its use in multidimensional searching. SIAM Journal on Computing, 17(3):427–462, 1988.
P. Ferragina and G. Manzini. Opportunistic data structures with applications. In Proc. 41st IEEE Symp. Foundations of Computer Science (FOCS’00), pages 390–398, 2000.
P. Ferragina and G. Manzini. An experimental study of an opportunistic index. In Proc. 12th ACM Symp. on Discrete Algorithms (SODA’ 01), pages 269–278, 2001.
P. Ferragina and G. Manzini. On compressing and indexing data. Technical Report TR-02-01, Dipartamento di Informatica, Univ. of Pisa, 2002.
R. Grossi and J. S. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In Proc. 32nd ACM Symp. Theory of Computing (STOC’00), pages 397–406, 2000.
G. Jacobson. Space-efficient static trees and graphs. In Proc. 30th IEEE Symp. Foundations of Computer Science (FOCS’89), pages 549–554, 1989.
J. Kärkkäinen. Suffix cactus: a cross between suffix tree and suffix array. In Proc. 6th Ann. Symp. Combinatorial Pattern Matching (CPM’95), LNCS 937, pages 191–204, 1995.
J. K’arkk’ainen. Repetition-based text indexes. PhD thesis, Dept. of Computer Science, University of Helsinki, Finland, 1999.
J. Kärkkäinen and E. Ukkonen. Lempel-Ziv parsing and sublinear-size index structures for string matching. In Proc. 3rd South American Workshop on String Processing (WSP’96), pages 141–155. Carleton University Press, 1996.
J. Kärkkäinen and E. Ukkonen. Sparse suffix trees. In Proc. 2nd Ann. Intl. Conference on Computing and Combinatorics (COCOON’96), LNCS 1090, 1996.
R. Kosaraju and G. Manzini. Compression of low entropy strings with Lempel-Ziv algorithms. SIAM Journal on Computing, 29(3):893–911, 1999.
S. Kurtz. Reducing the space requirements of suffix trees. Report 98-03, Technische Kakultät, Universität Bielefeld, 1998.
V. Mäkinen. Compact suffix array. In Proc. 11th Ann. Symp. Combinatorial Pattern Matching (CPM’00), LNCS 1848, pages 305–319, 2000.
U. Manber and G. Myers. Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, pages 935–948, 1993.
I. Munro. Tables. In Proc. 16th Foundations of Software Technology and Theoretical Computer Science (FSTTCS’96), LNCS 1180, pages 37–42, 1996.
I. Munro and V. Raman. Succint representation of balanced parentheses, static trees and planar graphs. In Proc. 38th IEEE Symp. Foundations of Computer Science (FOCS’97), pages 118–126, 1997.
I. Munro, V. Raman, and S. Rao. Space efficient suffix trees. Journal of Algorithms, pages 205–222, 2001.
G. Navarro, E. Moura, M. Neubert, N. Ziviani, and R. Baeza-Yates. Adding compression to block addressing inverted indexes. Information Retrieval, 3(1):49–77, 2000.
K. Sadakane. Compressed text databases with efficient query algorithms based on the compressed suffix array. In Proc. 11th Intl. Symp. Algorithms and Computation (ISAAC’00), LNCS 1969, pages 410–421, 2000.
T. Welch. A technique for high performance data compression. IEEE Computer Magazine, 17(6):8–19, June 1984.
I. Witten, A. Moffat, and T. Bell. Managing Gigabytes. Morgan Kaufmann Publishers, New York, second edition, 1999.
J. Ziv and A. Lempel. Compression of individual sequences via variable length coding. IEEE Trans. on Information Theory, 24:530–536, 1978.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Navarro, G. (2002). Indexing Text Using the Ziv-Lempel Trie. In: Laender, A.H.F., Oliveira, A.L. (eds) String Processing and Information Retrieval. SPIRE 2002. Lecture Notes in Computer Science, vol 2476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45735-6_28
Download citation
DOI: https://doi.org/10.1007/3-540-45735-6_28
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44158-8
Online ISBN: 978-3-540-45735-0
eBook Packages: Springer Book Archive