Indexed Multi-pattern Matching

Gagie, Travis; Karhu, Kalle; Kärkkäinen, Juha; Mäkinen, Veli; Salmela, Leena; Tarhio, Jorma

doi:10.1007/978-3-642-29344-3_34

Travis Gagie¹⁷,
Kalle Karhu¹⁷,
Juha Kärkkäinen¹⁸,
Veli Mäkinen¹⁸,
Leena Salmela¹⁸ &
…
Jorma Tarhio¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7256))

Included in the following conference series:

Latin American Symposium on Theoretical Informatics

952 Accesses
2 Citations

Abstract

If we want to search sequentially for occurrences of many patterns in a given text, then we can apply any of dozens of multi-pattern matching algorithms in the literature. As far as we know, however, no one has said what to do if we are given a compressed self-index for the text instead of the text itself. In this paper we show how to take advantage of similarities between the patterns to speed up searches in an index. For example, we show how to store a string S [1..n] in n H _k (S) + o (n (H _k (S) + 1)) bits such that, given the LZ77 parse of the concatenation of t patterns of total length ℓ and maximum individual length m, we can count the occurrences of each pattern in a total of \(\ensuremath{\mathcal{O}\!\left( {(z + t) \log \ell \log m \log^{1 + \epsilon} n} \right)}\) time, where z is the number of phrases in the parse.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Communications of the ACM 18(6), 333–340
Google Scholar
Amir, A., Landau, G.M., Lewenstein, M., Sokol, D.: Dynamic text and static pattern matching. ACM Transactions on Algorithms 3(2) (2007)
Google Scholar
Barbay, J., Gagie, T., Navarro, G., Nekrich, Y.: Alphabet Partitioning for Compressed Rank/Select and Applications. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010, Part II. LNCS, vol. 6507, pp. 315–326. Springer, Heidelberg (2010)
Chapter Google Scholar
Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Tech. Rep. 124, Digital Equipment Corporation (1994)
Google Scholar
Huynh, T.N.D., Hon, W.K., Lam, T.W., Sung, W.K.: Approximate string matching using compressed suffix arrays. Theoretical Computer Science 352(1-3), 240–249 (2006)
Article MathSciNet MATH Google Scholar
Karhu, K.: Improving exact search of multiple patterns from a compressed suffix array. In: Proceedings of the Prague Stringology Conference, pp. 226–231 (2011)
Google Scholar
Karpinski, M., Rytter, W., Shinohara, A.: Pattern-matching for strings with short descriptions. Nordic Journal of Computing 4(2), 172–186 (1997)
MathSciNet MATH Google Scholar
Lifshits, Y.: Processing Compressed Texts: A Tractability Border. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 228–240. Springer, Heidelberg (2007)
Chapter Google Scholar
Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22(5), 935–948 (1993)
Article MathSciNet MATH Google Scholar
Miyazaki, M., Shinohara, A., Takeda, M.: An Improved Pattern Matching Algorithm for Strings in Terms of Straight-line Programs. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 1–11. Springer, Heidelberg (1997)
Chapter Google Scholar
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1) (2007)
Google Scholar
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoretical Computer Science 302(1-3), 211–222 (2003)
Article MathSciNet MATH Google Scholar
Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. Journal of Algorithms 48(2), 294–313 (2003)
Article MathSciNet MATH Google Scholar
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Aalto University, Finland
Travis Gagie, Kalle Karhu & Jorma Tarhio
Department of Computer Science, University of Helsinki, Finland
Juha Kärkkäinen, Veli Mäkinen & Leena Salmela

Authors

Travis Gagie
View author publications
You can also search for this author in PubMed Google Scholar
Kalle Karhu
View author publications
You can also search for this author in PubMed Google Scholar
Juha Kärkkäinen
View author publications
You can also search for this author in PubMed Google Scholar
Veli Mäkinen
View author publications
You can also search for this author in PubMed Google Scholar
Leena Salmela
View author publications
You can also search for this author in PubMed Google Scholar
Jorma Tarhio
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Iowa State University, 50011, Ames, IA, USA
David Fernández-Baca

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gagie, T., Karhu, K., Kärkkäinen, J., Mäkinen, V., Salmela, L., Tarhio, J. (2012). Indexed Multi-pattern Matching. In: Fernández-Baca, D. (eds) LATIN 2012: Theoretical Informatics. LATIN 2012. Lecture Notes in Computer Science, vol 7256. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29344-3_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-29344-3_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29343-6
Online ISBN: 978-3-642-29344-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics