Abstract
Countless variants of the Lempel-Ziv compression are widely used in many real-life applications. This paper is concerned with a natural modification of the classical pattern matching problem inspired by the popularity of such compression methods: given an uncompressed pattern \(p[1\mathinner{\ldotp\ldotp} m]\) and a Lempel-Ziv representation of a string \(t[1\mathinner{\ldotp\ldotp} N]\), does p occur in t? Farach and Thorup [5] gave a randomized \(\mathcal{O}(n\log^2\frac{N}{n}+m)\) time solution for this problem, where n is the size of the compressed representation of t. Building on the methods of [3] and [6], we improve their result by developing a faster and fully deterministic \(\mathcal{O}(n\log\frac{N}{n}+m)\) time algorithm with the same space complexity. Note that for highly compressible texts, \(\log\frac{N}{n}\) might be of order n, so for such inputs the improvement is very significant. A small fragment of our method can be used to give an asymptotically optimal solution for the substring hashing problem considered by Farach and Muthukrishnan [4].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amir, A., Benson, G., Farach, M.: Let sleeping files lie: pattern matching in z-compressed files. In: SODA 1994: Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 705–714. SIAM, Philadelphia (1994)
Bender, M.A., Farach-Colton, M.: The lca problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Rasala, A., Sahai, A., Shelat, A.: Approximating the smallest grammar: Kolmogorov complexity in natural models. In: STOC 2002: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, pp. 792–801. ACM, New York (2002)
Farach, M., Muthukrishnan, S.: Perfect hashing for strings: Formalization and algorithms. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 130–140. Springer, Heidelberg (1996)
Farach, M., Thorup, M.: String matching in Lempel-Ziv compressed strings. In: STOC 1995: Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing, pp. 703–712. ACM, New York (1995)
Gawrychowski, P.: Optimal pattern matching in LZW compressed strings. In: SODA 2011: Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms (2011)
Iacono, J., Özkan, Ö.: Mergeable dictionaries. In: Abramsky, S., Gavoille, C., Kirchner, C., Meyer auf der Heide, F., Spirakis, P.G. (eds.) ICALP 2010. LNCS, vol. 6198, pp. 164–175. Springer, Heidelberg (2010)
Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM 53(6), 918–936 (2006)
Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)
Kida, T., Matsumoto, T., Shibata, Y., Takeda, M., Shinohara, A., Arikawa, S.: Collage system: a unifying framework for compressed pattern matching. Theor. Comput. Sci. 298, 253–272 (2003)
Kosaraju, S.R.: Pattern matching in compressed texts. In: Thiagarajan, P.S. (ed.) FSTTCS 1995. LNCS, vol. 1026, pp. 349–362. Springer, Heidelberg (1995)
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1-3), 211–222 (2003)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
Yao, A.C.C.: Lower bounds for algebraic computation trees with integer inputs. In: Proceedings of the 30th Annual Symposium on Foundations of Computer Science, pp. 308–313. IEEE Computer Society, Washington, DC, USA (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gawrychowski, P. (2011). Pattern Matching in Lempel-Ziv Compressed Strings: Fast, Simple, and Deterministic. In: Demetrescu, C., Halldórsson, M.M. (eds) Algorithms – ESA 2011. ESA 2011. Lecture Notes in Computer Science, vol 6942. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23719-5_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-23719-5_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23718-8
Online ISBN: 978-3-642-23719-5
eBook Packages: Computer ScienceComputer Science (R0)