Abstract
We show how to speed up two string-matching algorithms: the Boyer-Moore algorithm (BM algorithm) and its version called here the reversed-factor algorithm (the RF algorithm). The RF algorithm is based on factor graphs for the reverse of the pattern. The main feature of both algorithms is that they scan the text right-to-left from the supposed right position of the pattern, BM algorithm goes as far as the scanned segment is a suffix of the pattern, while the RF algorithm is scanning while it is a factor of the pattern. Then they make a shift of the pattern, forget the history and start again. The RF algorithm usually makes bigger shifts than BM, but is quadratic in the worst case. We show that it is enough to remember the last matched segment to speed up considerably the RF algorithm (to make linear number of comparisons with small coefficient) and to speed up BM algorithm with match-shifts (to make at most 2.n comparisons). Only a constant additional memory is needed for the search phase. We give alternative versions of an accelerated algorithm RF: the first one is based on combinatorial properties of primitive words, and two others use extensively the power of suffix trees.
Work by these authors is partially supported by PRC “Mathématiques-Informatique”.
Work by this author is partially supported by NATO Grant CRG 900293
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
A.V. Aho, Algorithms for finding patterns in strings, in: (J. van Leeuwen, editor, Handbook of Theoretical Computer Science, vol A, Algorithms and complexity, Elsevier, Amsterdam, 1990) 255–300.
A. Apostolico, The myriad virtues of suffix trees, in: (A. Apostolico, Z. Galil, editors, Combinatorial Algorithms on Words, NATO Advanced Science Institutes, Series F, vol. 12, Springer-Verlag, Berlin, 1985) 85–96.
A. Apostolico, R. Giancarlo, The Boyer-Moore-Galil string searching strategies revisited, SIAM J.Comput. 15 (1986) 98–105.
R.A. Baeza-Yates, M. Régnier, Average running time of the Boyer-Moore-Horspool algorithm, Theoret. Comput. Sci. (1991) to appear.
A. Blumer, J. Blumer, A. Ehrenfeucht, D. Haussler, M.T. Chen, J. Seiferas, The smallest automaton recognizing the subwords of a text, Theoret. Comput. Sci. 40 (1985) 31–55.
L. Banachowski, A. Kreczmar, W. Rytter, Analysis of algorithms and data structures, Addison Wesley, 1991.
R.S. Boyer, J.S. Moore, A fast string searching algorithm, Comm. ACM 20 (1977) 762–772.
R. Cole, Tight bounds on the complexity of the Boyer-Moore pattern matching algorithm, in: (2nd annual ACM Symp. on Discrete Algorithms, 1991) 224–233
M. Crochemore, Transducers and repetitions, Theoret. Comput. Sci. 45 (1986) 63–86.
Z. Galil, On improving the worst case running time of the Boyer-Moore string searching algorithm, Comm. ACM 22 (1979) 505–508.
L.J. Guibas, A.M. Odlyzko, A new proof of the linearity of the Boyer-Moore string searching algorithm, SIAM J.Comput. 9 (1980) 672–682.
D.E. Knuth, J.H. Morris Jr, V.R. Pratt, Fast pattern matching in strings, SIAM J.Comput. 6 (1977) 323–350.
T. Lecroq, A variation on Boyer-Moore algorithm, Theoret. Comput. Sci. (1991) to appear.
W. Rytter, A correct preprocessing algorithm for Boyer-Moore string searching, SIAM J.Comput. 9 (1980) 509–512.
A.C. Yao, The complexity of pattern matching for a random string, SIAM J.Comput. 8 (1979) 368–387.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1992 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Crochemore, M. et al. (1992). Speeding up two string-matching algorithms. In: Finkel, A., Jantzen, M. (eds) STACS 92. STACS 1992. Lecture Notes in Computer Science, vol 577. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-55210-3_215
Download citation
DOI: https://doi.org/10.1007/3-540-55210-3_215
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-55210-9
Online ISBN: 978-3-540-46775-5
eBook Packages: Springer Book Archive