Abstract
We define deterministic arithmetic automata (DAAs) and connect them to a framework called probabilistic arithmetic automata (PAAs) [9]. We use DAAs and PAAs to compute the entire exact probability distribution (in contrast to, e.g., asymptotic expectation and variance) of the number \(X^p_\ell\) of text characters accessed by the Horspool or Sunday pattern matching algorithms when matching a fixed pattern p against a random text of length ℓ. The random text model can be quite general, from simple uniform models to higher-order Markov models or hidden Markov models (HMMs). We develop several alternative constructions with different state spaces of the automata, leading to alternative time and space complexities for the computations. To our knowledge, this is the first time that suffix-based pattern matching algorithms are analyzed exactly. We present (perhaps surprising) exemplary results on short patterns and moderate text lengths. Our results easily generalize to any search-window based pattern matching algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baeza-Yates, R.A., Gonnet, G.H., Régnier, M.: Analysis of Boyer-Moore-type string searching algorithms. In: SODA ’90: Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms, pp. 328–343. SIAM, Philadelphia (1990)
Baeza-Yates, R.A., Régnier, M.: Average running time of the Boyer-Moore-Horspool algorithm. Theor. Comput. Sci. 92(1), 19–31 (1992)
Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Communications of the ACM 20(10), 762–772 (1977)
Herms, I., Rahmann, S.: Computing alignment seed sensitivity with probabilistic arithmetic automata. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 318–329. Springer, Heidelberg (2008)
Horspool, R.N.: Practical fast searching in strings. Software-Practice and Experience 10, 501–506 (1980)
Knuth, D.E., Morris, J., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal on Computing 6(2), 323–350 (1977)
Kucherov, G., Noé, L., Roytberg, M.: A unifying framework for seed sensitivity and its application to subset seeds. Journal of Bioinformatics and Computational Biology 4(2), 553–569 (2006)
Mahmoud, H.M., Smythe, R.T., Régnier, M.: Analysis of Boyer-Moore-Horspool string-matching heuristic. Random Structures and Algorithms 10(1-2), 169–186 (1997)
Marschall, T., Rahmann, S.: Probabilistic arithmetic automata and their application to pattern matching statistics. In: Ferragina, P., Landau, G.M. (eds.) CPM 2008. LNCS, vol. 5029, pp. 95–106. Springer, Heidelberg (2008)
Marschall, T., Rahmann, S.: Efficient exact motif discovery. Bioinformatics 25(12), i356–i364 (2009)
Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings. Cambridge University Press, Cambridge (2002)
Schulz, M., Weese, D., Rausch, T., Döring, A., Reinert, K., Vingron, M.: Fast and adaptive variable order Markov chain construction. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 306–317. Springer, Heidelberg (2008)
Smythe, R.T.: The Boyer-Moore-Horspool heuristic with Markovian input. Random Structures and Algorithms 18(2), 153–163 (2001)
Sunday, D.M.: A very fast substring search algorithm. Communications of the ACM 33(8), 132–142 (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Marschall, T., Rahmann, S. (2010). Exact Analysis of Horspool’s and Sunday’s Pattern Matching Algorithms with Probabilistic Arithmetic Automata. In: Dediu, AH., Fernau, H., Martín-Vide, C. (eds) Language and Automata Theory and Applications. LATA 2010. Lecture Notes in Computer Science, vol 6031. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13089-2_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-13089-2_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13088-5
Online ISBN: 978-3-642-13089-2
eBook Packages: Computer ScienceComputer Science (R0)