Skip to main content

Pattern Matching in Lempel-Ziv Compressed Strings: Fast, Simple, and Deterministic

  • Conference paper
Algorithms – ESA 2011 (ESA 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6942))

Included in the following conference series:

Abstract

Countless variants of the Lempel-Ziv compression are widely used in many real-life applications. This paper is concerned with a natural modification of the classical pattern matching problem inspired by the popularity of such compression methods: given an uncompressed pattern \(p[1\mathinner{\ldotp\ldotp} m]\) and a Lempel-Ziv representation of a string \(t[1\mathinner{\ldotp\ldotp} N]\), does p occur in t? Farach and Thorup [5] gave a randomized \(\mathcal{O}(n\log^2\frac{N}{n}+m)\) time solution for this problem, where n is the size of the compressed representation of t. Building on the methods of [3] and [6], we improve their result by developing a faster and fully deterministic \(\mathcal{O}(n\log\frac{N}{n}+m)\) time algorithm with the same space complexity. Note that for highly compressible texts, \(\log\frac{N}{n}\) might be of order n, so for such inputs the improvement is very significant. A small fragment of our method can be used to give an asymptotically optimal solution for the substring hashing problem considered by Farach and Muthukrishnan [4].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Amir, A., Benson, G., Farach, M.: Let sleeping files lie: pattern matching in z-compressed files. In: SODA 1994: Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 705–714. SIAM, Philadelphia (1994)

    Google Scholar 

  2. Bender, M.A., Farach-Colton, M.: The lca problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  3. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Rasala, A., Sahai, A., Shelat, A.: Approximating the smallest grammar: Kolmogorov complexity in natural models. In: STOC 2002: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, pp. 792–801. ACM, New York (2002)

    Chapter  Google Scholar 

  4. Farach, M., Muthukrishnan, S.: Perfect hashing for strings: Formalization and algorithms. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 130–140. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  5. Farach, M., Thorup, M.: String matching in Lempel-Ziv compressed strings. In: STOC 1995: Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing, pp. 703–712. ACM, New York (1995)

    Chapter  Google Scholar 

  6. Gawrychowski, P.: Optimal pattern matching in LZW compressed strings. In: SODA 2011: Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms (2011)

    Google Scholar 

  7. Iacono, J., Özkan, Ö.: Mergeable dictionaries. In: Abramsky, S., Gavoille, C., Kirchner, C., Meyer auf der Heide, F., Spirakis, P.G. (eds.) ICALP 2010. LNCS, vol. 6198, pp. 164–175. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  8. Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM 53(6), 918–936 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  9. Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  10. Kida, T., Matsumoto, T., Shibata, Y., Takeda, M., Shinohara, A., Arikawa, S.: Collage system: a unifying framework for compressed pattern matching. Theor. Comput. Sci. 298, 253–272 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  11. Kosaraju, S.R.: Pattern matching in compressed texts. In: Thiagarajan, P.S. (ed.) FSTTCS 1995. LNCS, vol. 1026, pp. 349–362. Springer, Heidelberg (1995)

    Chapter  Google Scholar 

  12. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1-3), 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  13. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  14. Yao, A.C.C.: Lower bounds for algebraic computation trees with integer inputs. In: Proceedings of the 30th Annual Symposium on Foundations of Computer Science, pp. 308–313. IEEE Computer Society, Washington, DC, USA (1989)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gawrychowski, P. (2011). Pattern Matching in Lempel-Ziv Compressed Strings: Fast, Simple, and Deterministic. In: Demetrescu, C., Halldórsson, M.M. (eds) Algorithms – ESA 2011. ESA 2011. Lecture Notes in Computer Science, vol 6942. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23719-5_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23719-5_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23718-8

  • Online ISBN: 978-3-642-23719-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics