Abstract
We consider the problem of pattern matching when the text is in compressed form. As in Amir, Benson and Farach, we assume that the text is compressed by the Lempel-Ziv-Welch scheme. If the compressed text is of length n and the pattern is of length of m, our basic compression algorithm runs in O(n+m√m log m) steps, as against Amir, et al's bound of O(n+m 2) steps. We extend the basic algorithm into another that achieves, for any k ≥1, O(nk+m1+1/k log m) steps.
Supported by NSF Grants CCR9107293 and CCR9508545
Preview
Unable to display preview. Download preview PDF.
References
A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The design and analysis of computer algorithms. Addison-Wesley Publishing Co., Reading, Mass., 1974.
A. Amir, G. Benson, and M. Farach. Let sleeping files lie: Pattern matching in Z-compressed files. Proc. of 5th Annual ACM-SIAM Symp. on Discrete Algorithms, pages 705–714, 1994.
T. Eilam-Tsoreff and U. Vishkin. Matching patterns in a string subject to multilinear transformations. Proc. of International Workshop on Sequences, Combinatorics, Compression, Security and Transmission, Salerno, Italy, June 1988.
M. Farach and M. Thorup. Pattern matching in Lempel-Ziv compressed strings. Proc. of 27th Annual ACM Symp. on Theory of Computing, pages 703–712, 1995.
J. JaJa. An introduction to parallel algorithms. Addison Wesley Publishing Co., Reading, Mass., 1992.
D. E. Knuth, J. H. Morris, and V. R. Pratt. Fast pattern matching in strings. SIAM J. on Computing, pages 323–350, 1977.
E. M. McCreight. A space-economical suffix tree construction algorithm. J. of the ACM, pages 262–272, 1976.
B. Schieber and U. Vishkin. On finding lowest common ancestors: Simplification and parallelization. SIAM J. on Computing, pages 1253–1262, 1988.
R. Sundar. Twists, turns, cascades, deque conjecture, and scanning theorem. Proc. of 30th Annual IEEE Symp. on Foundations of Computer Science, pages 555–559, 1989.
R. Tarjan. Efficiency of a Good But Not Linear Set Union Algorithm. J. of ACM, pages 215–225, 1975.
R. Tarjan. Sequential access in splay trees takes linear time. Combinatorica, pages 367–378, 1985.
P. Weiner. Linear pattern matching algorithm. Proc. of 14th Annual IEEE Symp. on Switching and Automata Theory, pages 1–11, 1973.
T. A. Welch. A technique for high-performance data compression. IEEE Computer, pages 8–19, 1984.
J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Trans. on Information Theory, pages 337–343, 1977.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kosaraju, S.R. (1995). Pattern matching in compressed texts. In: Thiagarajan, P.S. (eds) Foundations of Software Technology and Theoretical Computer Science. FSTTCS 1995. Lecture Notes in Computer Science, vol 1026. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60692-0_60
Download citation
DOI: https://doi.org/10.1007/3-540-60692-0_60
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60692-5
Online ISBN: 978-3-540-49263-4
eBook Packages: Springer Book Archive