Skip to main content

Efficient algorithms for Lempel-Ziv encoding

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1097))

Abstract

We consider several basic problems for texts and show that if the input texts are given by their Lempel-Ziv codes then the problems can be solved deterministically in polynomial time in the case when the original (uncompressed) texts are of exponential size. The growing importance of massively stored information requires new approaches to algorithms for compressed texts without decompressing. Denote by LZ(ω) the version of a string ω produced by Lempel-Ziv encoding algorithm. For given compressed strings LZ(T), LZ(P) we give the first known deterministic polynomial time algorithms to compute compressed representations of the set of all occurrences of the pattern P in T, all periods of T, all palindromes of T, and all squares of T. Then we consider several classical language recognition problems:

  • regular language recognition: given LZ(T) and a language L described by a regular expression, test if T ε L,

  • extended regular language recognition: given LZ(T) and a language L described by a LZ-compressed regular expression, test if T ε L, the alphabet is unary,

  • context-free language recognition: given LZ(T) and a language L described by a context-free grammar, test if T ε L, the alphabet is unary.

We show that the first recognition problem has a polynomial time algorithm and the other two problems are NP-hard.

We show also that the LZ encoding can be computed on-line in polynomial time delay and small space (i.e. proportional to the size of the compressed text). Also the compressed representation of a patternmatching automaton for the compressed pattern is computed in polynomial time.

On leave from Institute of Informatics, Warsaw University, ul. Banacha 2, 02097, Warszawa, Poland.

This research was partially supported by the DFG Grant KA 673/4-1, and by the ESPRIT BR Grant 7097 and the ECUS 030.

Supported partially by the grant KBN 8T11C01208.

Supported partially by the grant KBN 8T11C01208.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Amir, G. Benson and M. Farach, Let sleeping files lie: pattern-matching in Z-compressed files, in SODA '94.

    Google Scholar 

  2. A. Amir, G. Benson, Efficient two dimensional compressed matching, Proc. of the 2nd IEEE Data Compression Conference 279–288 (1992).

    Google Scholar 

  3. A. Amir, G. Benson and M. Farach, Optimal two-dimensional compressed matching, in ICALP'94.

    Google Scholar 

  4. A. Apostolico, D. Breslauer, Z. Galil, Optimal parallel algorithms for periods, palindromes and squares, in ICALP'92, pp. 296–307.

    Google Scholar 

  5. M. Crochemore and W. Rytter, Text Algorithms, Oxford University Press, New York (1994).

    Google Scholar 

  6. M. Farach and M. Thorup, String matching in Lempel-Ziv compressed strings, in STOC'95, pp. 703–712.

    Google Scholar 

  7. M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman (1979).

    Google Scholar 

  8. L. Gasieniec, M. Karpiński, W. Plandowski and W. Rytter, Randomized Efficient Algorithms for Compressed Strings: the Finger-Print Approach, to appear in proceedings of the 7th Combinatorial Pattern Matching, Laguna Beach (1996).

    Google Scholar 

  9. M. Karpinski, W. Rytter and A. Shinohara, Pattern-matching for strings with short description, in Combinatorial Pattern Matching, 1995.

    Google Scholar 

  10. D. Knuth, The Art of Computing, Vol. II: Seminumerical Algorithms. Second edition. Addison-Wesley, 1981.

    Google Scholar 

  11. A. Lempel and J. Ziv, On the complexity of finite sequences, IEEE Trans. on Inf. Theory 22, 75–81 (1976).

    Google Scholar 

  12. W. Plandowski, Testing equivalence of morphisms on context-free languages, ESA'94, Lecture Notes in Computer Science 855, Springer-Verlag, 460–470 (1994).

    Google Scholar 

  13. J. Storer, Data compression: methods and theory, Computer Science Press, Rockville, Maryland, 1988.

    Google Scholar 

  14. J. Ziv and A. Lempel, A universal algorithm for sequential data compression, IEEE Trans. on Inf. Theory vo. IT-23(3), 337–343, 1977.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Rolf Karlsson Andrzej Lingas

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gasieniec, L., Karpinski, M., Plandowski, W., Rytter, W. (1996). Efficient algorithms for Lempel-Ziv encoding. In: Karlsson, R., Lingas, A. (eds) Algorithm Theory — SWAT'96. SWAT 1996. Lecture Notes in Computer Science, vol 1097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61422-2_148

Download citation

  • DOI: https://doi.org/10.1007/3-540-61422-2_148

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-61422-7

  • Online ISBN: 978-3-540-68529-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics