Skip to main content

Indexed Multi-pattern Matching

  • Conference paper
LATIN 2012: Theoretical Informatics (LATIN 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7256))

Included in the following conference series:

Abstract

If we want to search sequentially for occurrences of many patterns in a given text, then we can apply any of dozens of multi-pattern matching algorithms in the literature. As far as we know, however, no one has said what to do if we are given a compressed self-index for the text instead of the text itself. In this paper we show how to take advantage of similarities between the patterns to speed up searches in an index. For example, we show how to store a string S [1..n] in n H k (S) + o (n (H k (S) + 1)) bits such that, given the LZ77 parse of the concatenation of t patterns of total length ℓ and maximum individual length m, we can count the occurrences of each pattern in a total of \(\ensuremath{\mathcal{O}\!\left( {(z + t) \log \ell \log m \log^{1 + \epsilon} n} \right)}\) time, where z is the number of phrases in the parse.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Communications of the ACM 18(6), 333–340

    Google Scholar 

  2. Amir, A., Landau, G.M., Lewenstein, M., Sokol, D.: Dynamic text and static pattern matching. ACM Transactions on Algorithms 3(2) (2007)

    Google Scholar 

  3. Barbay, J., Gagie, T., Navarro, G., Nekrich, Y.: Alphabet Partitioning for Compressed Rank/Select and Applications. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010, Part II. LNCS, vol. 6507, pp. 315–326. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  4. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Tech. Rep. 124, Digital Equipment Corporation (1994)

    Google Scholar 

  5. Huynh, T.N.D., Hon, W.K., Lam, T.W., Sung, W.K.: Approximate string matching using compressed suffix arrays. Theoretical Computer Science 352(1-3), 240–249 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  6. Karhu, K.: Improving exact search of multiple patterns from a compressed suffix array. In: Proceedings of the Prague Stringology Conference, pp. 226–231 (2011)

    Google Scholar 

  7. Karpinski, M., Rytter, W., Shinohara, A.: Pattern-matching for strings with short descriptions. Nordic Journal of Computing 4(2), 172–186 (1997)

    MathSciNet  MATH  Google Scholar 

  8. Lifshits, Y.: Processing Compressed Texts: A Tractability Border. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 228–240. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22(5), 935–948 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  10. Miyazaki, M., Shinohara, A., Takeda, M.: An Improved Pattern Matching Algorithm for Strings in Terms of Straight-line Programs. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 1–11. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  11. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1) (2007)

    Google Scholar 

  12. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoretical Computer Science 302(1-3), 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  13. Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. Journal of Algorithms 48(2), 294–313 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  14. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gagie, T., Karhu, K., Kärkkäinen, J., Mäkinen, V., Salmela, L., Tarhio, J. (2012). Indexed Multi-pattern Matching. In: Fernández-Baca, D. (eds) LATIN 2012: Theoretical Informatics. LATIN 2012. Lecture Notes in Computer Science, vol 7256. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29344-3_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29344-3_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29343-6

  • Online ISBN: 978-3-642-29344-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics