Skip to main content

Exact Analysis of Horspool’s and Sunday’s Pattern Matching Algorithms with Probabilistic Arithmetic Automata

  • Conference paper
Language and Automata Theory and Applications (LATA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6031))

Abstract

We define deterministic arithmetic automata (DAAs) and connect them to a framework called probabilistic arithmetic automata (PAAs) [9]. We use DAAs and PAAs to compute the entire exact probability distribution (in contrast to, e.g., asymptotic expectation and variance) of the number \(X^p_\ell\) of text characters accessed by the Horspool or Sunday pattern matching algorithms when matching a fixed pattern p against a random text of length ℓ. The random text model can be quite general, from simple uniform models to higher-order Markov models or hidden Markov models (HMMs). We develop several alternative constructions with different state spaces of the automata, leading to alternative time and space complexities for the computations. To our knowledge, this is the first time that suffix-based pattern matching algorithms are analyzed exactly. We present (perhaps surprising) exemplary results on short patterns and moderate text lengths. Our results easily generalize to any search-window based pattern matching algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R.A., Gonnet, G.H., Régnier, M.: Analysis of Boyer-Moore-type string searching algorithms. In: SODA ’90: Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms, pp. 328–343. SIAM, Philadelphia (1990)

    Google Scholar 

  2. Baeza-Yates, R.A., Régnier, M.: Average running time of the Boyer-Moore-Horspool algorithm. Theor. Comput. Sci. 92(1), 19–31 (1992)

    Article  MATH  Google Scholar 

  3. Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Communications of the ACM 20(10), 762–772 (1977)

    Article  Google Scholar 

  4. Herms, I., Rahmann, S.: Computing alignment seed sensitivity with probabilistic arithmetic automata. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 318–329. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Horspool, R.N.: Practical fast searching in strings. Software-Practice and Experience 10, 501–506 (1980)

    Article  Google Scholar 

  6. Knuth, D.E., Morris, J., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal on Computing 6(2), 323–350 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  7. Kucherov, G., Noé, L., Roytberg, M.: A unifying framework for seed sensitivity and its application to subset seeds. Journal of Bioinformatics and Computational Biology 4(2), 553–569 (2006)

    Article  Google Scholar 

  8. Mahmoud, H.M., Smythe, R.T., Régnier, M.: Analysis of Boyer-Moore-Horspool string-matching heuristic. Random Structures and Algorithms 10(1-2), 169–186 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  9. Marschall, T., Rahmann, S.: Probabilistic arithmetic automata and their application to pattern matching statistics. In: Ferragina, P., Landau, G.M. (eds.) CPM 2008. LNCS, vol. 5029, pp. 95–106. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  10. Marschall, T., Rahmann, S.: Efficient exact motif discovery. Bioinformatics 25(12), i356–i364 (2009)

    Article  Google Scholar 

  11. Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings. Cambridge University Press, Cambridge (2002)

    MATH  Google Scholar 

  12. Schulz, M., Weese, D., Rausch, T., Döring, A., Reinert, K., Vingron, M.: Fast and adaptive variable order Markov chain construction. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 306–317. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  13. Smythe, R.T.: The Boyer-Moore-Horspool heuristic with Markovian input. Random Structures and Algorithms 18(2), 153–163 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  14. Sunday, D.M.: A very fast substring search algorithm. Communications of the ACM 33(8), 132–142 (1990)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Marschall, T., Rahmann, S. (2010). Exact Analysis of Horspool’s and Sunday’s Pattern Matching Algorithms with Probabilistic Arithmetic Automata. In: Dediu, AH., Fernau, H., Martín-Vide, C. (eds) Language and Automata Theory and Applications. LATA 2010. Lecture Notes in Computer Science, vol 6031. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13089-2_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13089-2_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13088-5

  • Online ISBN: 978-3-642-13089-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics