Abstract
Given an LZW/LZ78 compressed text, we want to find an approximate occurrence of a given pattern of length m. The goal is to achieve time complexity depending on the size n of the compressed representation of the text instead of its length. We consider two specific definitions of approximate matching, namely the Hamming distance and the edit distance, and show how to achieve \(\mathcal{O}(n\sqrt{m}k^{2})\) and \(\mathcal{O}(n\sqrt{m}k^{3})\) running time, respectively, where k is the bound on the distance, both in linear space. Even for very small values of k, the best previously known solutions required Ω(nm) time. Our main contribution is applying a periodicity-based argument in a way that is computationally effective even if we operate on a compressed representation of a string, while the previous solutions were either based on a dynamic programming, or a black-box application of tools developed for uncompressed strings.
Supported by NCN grant 2011/01/D/ST6/07164, 2011–2014.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amir, A., Benson, G., Farach, M.: Let sleeping files lie: Pattern matching in Z-compressed files. J. Comput. Syst. Sci. 52(2), 299–307 (1996)
Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with k mismatches. J. Algorithms 50(2), 257–275 (2004)
Bille, P., Fagerberg, R., Gørtz, I.L.: Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts. ACM Transactions on Algorithms 6(1) (2009)
Cole, R., Hariharan, R.: Approximate string matching: A simpler faster algorithm. SIAM J. Comput. 31(6), 1761–1782 (2002)
Crochemore, M., Rytter, W.: Jewels of stringology. World Scientific (2002)
Gawrychowski, P.: Optimal pattern matching in LZW compressed strings. In: Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, pp. 362–372. SIAM (2011)
Gawrychowski, P.: Simple and efficient LZW-compressed multiple pattern matching. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 232–242. Springer, Heidelberg (2012)
Gawrychowski, P.: Tying up the loose ends in fully LZW-compressed pattern matching. In: Dürr, C., Wilke, T. (eds.) STACS. LIPIcs, vol. 14, pp. 624–635. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2012)
Kärkkäinen, J., Navarro, G., Ukkonen, E.: Approximate string matching on Ziv-Lempel compressed text. J. Discrete Algorithms 1(3-4), 313–338 (2003)
Landau, G.M., Vishkin, U.: Efficient string matching with k mismatches. Theor. Comput. Sci. 43, 239–249 (1986)
Landau, G.M., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10(2), 157–169 (1989)
Welch, T.A.: A technique for high-performance data compression. Computer 17(6), 8–19 (1984)
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24(5), 530–536 (1978)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gawrychowski, P., Straszak, D. (2013). Beating \(\mathcal{O}(nm)\) in Approximate LZW-Compressed Pattern Matching. In: Cai, L., Cheng, SW., Lam, TW. (eds) Algorithms and Computation. ISAAC 2013. Lecture Notes in Computer Science, vol 8283. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45030-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-45030-3_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45029-7
Online ISBN: 978-3-642-45030-3
eBook Packages: Computer ScienceComputer Science (R0)