Abstract
The LZ-index is a compressed full-text self-index able to represent a text P 1...m, over an alphabet of size \(\sigma = O(\textrm{polylog}(u))\) and with k-th order empirical entropy H k (T), using 4uH k (T) + o(ulogσ) bits for any k = o(log σ u). It can report all the occ occurrences of a pattern P 1...m in T in O(m 3logσ + (m + occ)logu) worst case time. Its main drawback is the factor 4 in its space complexity, which makes it larger than other state-of-the-art alternatives. In this paper we present two different approaches to reduce the space requirement of LZ-index. In both cases we achieve (2 + ε)uH k (T) + o(ulogσ) bits of space, for any constant ε> 0, and we simultaneously improve the search time to O(m 2logm + (m + occ)logu). Both indexes support displaying any subtext of length ℓ in optimal O(ℓ/log σ u) time. In addition, we show how the space can be squeezed to (1 + ε)uH k (T) + o(ulogσ) to obtain a structure with O(m 2) average search time for \(m \geqslant 2\log_\sigma{u}\).
Supported in part by CONICYT PhD Fellowship Program (first author) and Fondecyt Grant 1-050493 (second author) and the Grant-in-Aid of the Ministry of Education, Science, Sports and Culture of Japan (third author).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Benoit, D., Demaine, E., Munro, I., Raman, R., Raman, V., Rao, S.S.: Representing trees of higher degree. Algorithmica 43(4), 275–292 (2005)
Chazelle, B.: A functional approach to data structures and its use in multidimensional searching. SIAM J. on Computing 17(3), 427–462 (1988)
Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Structuring labeled trees for optimal succinctness, and beyond. In: Proc. FOCS, pp. 184–196 (2005)
Ferragina, P., Manzini, G.: Indexing compressed texts. J. of the ACM 54(4), 552–581 (2005)
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: An alphabet-friendly FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 150–160. Springer, Heidelberg (2004), Extended version: ACM TALG (to appear)
Geary, R., Raman, R., Raman, V.: Succinct ordinal trees with level-ancestor queries. In: Proc. SODA, pp. 1–10 (2004)
Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proc. SODA, pp. 841–850 (2003)
Kosaraju, R., Manzini, G.: Compression of low entropy strings with Lempel-Ziv algorithms. SIAM J. on Computing 29(3), 893–911 (1999)
Manzini, G.: An analysis of the Burrows-Wheeler transform. J. of the ACM 48(3), 407–430 (2001)
Morrison, D.R.: Patricia – practical algorithm to retrieve information coded in alphanumeric. J. of the ACM 15(4), 514–534 (1968)
Munro, I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)
Munro, I., Raman, R., Raman, V., Rao, S.S.: Succinct representations of permutations. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 345–356. Springer, Heidelberg (2003)
Munro, J.I., Raman, V.: Succinct Representation of Balanced Parentheses and Static Trees. SIAM J. on Computing 31(3), 762–776 (2001)
Navarro, G.: Indexing text using the Ziv-Lempel trie. Journal of Discrete Algorithms (JDA) 2(1), 87–114 (2004), See also TR/DCC-2003-0, Dept. of CS, U. Chile, ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/jlzindex.ps.gz
Raman, R., Raman, V., Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. SODA, pp. 233–242 (2002)
Sadakane, K.: New Text Indexing Functionalities of the Compressed Suffix Arrays. J. of Algorithms 48(2), 294–313 (2003)
Sadakane, K., Grossi, R.: Squeezing Succinct Data Structures into Entropy Bounds. In: Proc. SODA, pp. 1230–1239 (2006)
Ziv, J., Lempel, A.: Compression of individual sequences via variable–rate coding. IEEE Trans. Information Theory 24(5), 530–536 (1978)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Arroyuelo, D., Navarro, G., Sadakane, K. (2006). Reducing the Space Requirement of LZ-Index. In: Lewenstein, M., Valiente, G. (eds) Combinatorial Pattern Matching. CPM 2006. Lecture Notes in Computer Science, vol 4009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780441_29
Download citation
DOI: https://doi.org/10.1007/11780441_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35455-0
Online ISBN: 978-3-540-35461-1
eBook Packages: Computer ScienceComputer Science (R0)