Abstract
If we want to search sequentially for occurrences of many patterns in a given text, then we can apply any of dozens of multi-pattern matching algorithms in the literature. As far as we know, however, no one has said what to do if we are given a compressed self-index for the text instead of the text itself. In this paper we show how to take advantage of similarities between the patterns to speed up searches in an index. For example, we show how to store a string S [1..n] in n H k (S) + o (n (H k (S) + 1)) bits such that, given the LZ77 parse of the concatenation of t patterns of total length ℓ and maximum individual length m, we can count the occurrences of each pattern in a total of \(\ensuremath{\mathcal{O}\!\left( {(z + t) \log \ell \log m \log^{1 + \epsilon} n} \right)}\) time, where z is the number of phrases in the parse.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Communications of the ACM 18(6), 333–340
Amir, A., Landau, G.M., Lewenstein, M., Sokol, D.: Dynamic text and static pattern matching. ACM Transactions on Algorithms 3(2) (2007)
Barbay, J., Gagie, T., Navarro, G., Nekrich, Y.: Alphabet Partitioning for Compressed Rank/Select and Applications. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010, Part II. LNCS, vol. 6507, pp. 315–326. Springer, Heidelberg (2010)
Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Tech. Rep. 124, Digital Equipment Corporation (1994)
Huynh, T.N.D., Hon, W.K., Lam, T.W., Sung, W.K.: Approximate string matching using compressed suffix arrays. Theoretical Computer Science 352(1-3), 240–249 (2006)
Karhu, K.: Improving exact search of multiple patterns from a compressed suffix array. In: Proceedings of the Prague Stringology Conference, pp. 226–231 (2011)
Karpinski, M., Rytter, W., Shinohara, A.: Pattern-matching for strings with short descriptions. Nordic Journal of Computing 4(2), 172–186 (1997)
Lifshits, Y.: Processing Compressed Texts: A Tractability Border. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 228–240. Springer, Heidelberg (2007)
Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22(5), 935–948 (1993)
Miyazaki, M., Shinohara, A., Takeda, M.: An Improved Pattern Matching Algorithm for Strings in Terms of Straight-line Programs. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 1–11. Springer, Heidelberg (1997)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1) (2007)
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoretical Computer Science 302(1-3), 211–222 (2003)
Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. Journal of Algorithms 48(2), 294–313 (2003)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gagie, T., Karhu, K., Kärkkäinen, J., Mäkinen, V., Salmela, L., Tarhio, J. (2012). Indexed Multi-pattern Matching. In: Fernández-Baca, D. (eds) LATIN 2012: Theoretical Informatics. LATIN 2012. Lecture Notes in Computer Science, vol 7256. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29344-3_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-29344-3_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29343-6
Online ISBN: 978-3-642-29344-3
eBook Packages: Computer ScienceComputer Science (R0)