Speeding up two string-matching algorithms

Crochemore, M.; Czumaj, A.; Gasieniec, L.; Jarominek, S.; Lecroq, T.; Plandowski, W.; Rytter, W.

doi:10.1007/BF01185427

Speeding up two string-matching algorithms

Published: November 1994

Volume 12, pages 247–267, (1994)
Cite this article

Algorithmica Aims and scope Submit manuscript

M. Crochemore¹,
A. Czumaj²,
L. Gasieniec²,
S. Jarominek²,
T. Lecroq¹,
W. Plandowski² &
…
W. Rytter²

299 Accesses
6 Altmetric
Explore all metrics

Abstract

We show how to speed up two string-matching algorithms: the Boyer-Moore algorithm (BM algorithm), and its version called here the reverse factor algorithm (RF algorithm). The RF algorithm is based on factor graphs for the reverse of the pattern. The main feature of both algorithms is that they scan the text right-to-left from the supposed right position of the pattern. The BM algorithm goes as far as the scanned segment (factor) is a suffix of the pattern. The RF algorithm scans while the segment is a factor of the pattern. Both algorithms make a shift of the pattern, forget the history, and start again. The RF algorithm usually makes bigger shifts than BM, but is quadratic in the worst case. We show that it is enough to remember the last matched segment (represented by two pointers to the text) to speed up the RF algorithm considerably (to make a linear number of inspections of text symbols, with small coefficient), and to speed up the BM algorithm (to make at most 2 ·n comparisons). Only a constant additional memory is needed for the search phase. We give alternative versions of an accelerated RF algorithm: the first one is based on combinatorial properties of primitive words, and the other two use the power of suffix trees extensively. The paper demonstrates the techniques to transform algorithms, and also shows interesting new applications of data structures representing all subwords of the pattern in compact form.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast Exact Pattern Matching by the Means of a Character Bit Representation

Article 07 March 2022

Fast and Simple Computations Using Prefix Tables Under Hamming and Edit Distance

Fast Compressed Self-indexes with Deterministic Linear-Time Construction

Article 22 October 2019

References

A. V. Aho, Algorithms for finding patterns in strings, inHandbook of Theoretical Computer Science, vol. A (J. van Leeuwen, ed.), Elsevier, Amsterdam, 1990, pp. 255–300.
Google Scholar
A. Apostolico, The myriad virtues of suffix trees, inCombinatorial Algorithms on Words (A. Apostolico and Z. Galil, eds.), NATO Advanced Science Institutes, Series F, vol. 12, Springer-Verlag, Berlin, 1985, pp. 85–96.
Google Scholar
A. Apostolico and R. Giancarlo, The Boyer-Moore-Galil string searching strategies revisited,SIAM J. Comput. 15 (1986), 98–105.
Article MATH MathSciNet Google Scholar
R. A. Baeza-Yates and M. Régnier, Average running time of the Boyer-Moore-Horspool algorithm,Theoret. Comput. Sci. 92(1) (1992), 19–31.
Article MATH MathSciNet Google Scholar
L. Banachowski, A. Kreczmar, and W. Rytter,Analysis of Algorithms and Data Structures, Addison-Wesley, Reading, MA, 1991.
MATH Google Scholar
A. Blumer, J. Blumer, A. Ehrenfeucht, D. Haussler, M. T. Chen, and J. Seiferas, The smallest automaton recognizing the subwords of a text,Theoret. Comput. Sci. 40 (1985), 31–55.
Article MATH MathSciNet Google Scholar
R. S. Boyer and J. S. Moore, A fast string searching algorithm,Comm. ACM 20 (1977), 762–772.
Article Google Scholar
R. Cole, Tight bounds on the complexity of the Boyer-Moore pattern matching algorithm,Proceedings of the 2nd Annual ACM Symposium on Discrete Algorithms, 1990, pp. 224–233.
M. Crochemore, Transducers and repetitions,Theoret. Comput. Sci. 45 (1986), 63–86.
Article MATH MathSciNet Google Scholar
Z. Galil, On improving the worst case running time of the Boyer-Moore string searching algorithm,Comm. ACM 22 (1979), 505–508.
Article MATH MathSciNet Google Scholar
L. J. Guibas and A. M. Odlyzko, A new proof of the linearity of the Boyer-Moore string searching algorithm,SIAM J. Comput. 9 (1980), 672–682.
Article MATH MathSciNet Google Scholar
R. N. Horspool, Practical fast searching in strings,Software—Practice and Experience,10 (1980), 501–506.
Article Google Scholar
A. Hume and D. M. Sunday, Fast string searching,Software—Practice and Experience 21(11) (1991), 1221–1248.
Article Google Scholar
D. E. Knuth, J. H. Morris Jr and V. R. Pratt, Fast pattern matching in strings,SIAM J. Comput. 6 (1977), 323–350.
Article MATH MathSciNet Google Scholar
T. Lecroq, A variation on Boyer-Moore algorithm,Theoret. Comput. Sci. 92 (1992), 119–144.
Article MATH MathSciNet Google Scholar
W. Rytter, A correct preprocessing algorithm for Boyer-Moore string searching,SIAM J. Comput. 9 (1980), 509–512.
Article MATH MathSciNet Google Scholar
A. C. Yao, The complexity of pattern matching for a random string,SIAM J. Comput. 8 (1979), 368–387.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

LITP, Institut Blaise Pascal, Université Paris 7, 2 Place Jussieu, 75251, Paris Cedex 05, France
M. Crochemore & T. Lecroq
Institute of Informatics, Warsaw University, ul. Banacha 2, 00-913, Warsaw 59, Poland
A. Czumaj, L. Gasieniec, S. Jarominek, W. Plandowski & W. Rytter

Authors

M. Crochemore
View author publications
You can also search for this author in PubMed Google Scholar
A. Czumaj
View author publications
You can also search for this author in PubMed Google Scholar
L. Gasieniec
View author publications
You can also search for this author in PubMed Google Scholar
S. Jarominek
View author publications
You can also search for this author in PubMed Google Scholar
T. Lecroq
View author publications
You can also search for this author in PubMed Google Scholar
W. Plandowski
View author publications
You can also search for this author in PubMed Google Scholar
W. Rytter
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Communicated by Alberto Apostolico.

The work by M. Crochemore and T. Lecroq was partially supported by PRC “Mathématiques-Informatique,” M. Crochemore was also partially supported by NATO Grant CRG 900293, and the work by A. Czumaj, L. Gasieniec, S. Jarominek, W. Plandowski, and W. Rytter was supported by KBN of the Polish Ministry of Education.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Crochemore, M., Czumaj, A., Gasieniec, L. et al. Speeding up two string-matching algorithms. Algorithmica 12, 247–267 (1994). https://doi.org/10.1007/BF01185427

Download citation

Received: 15 August 1991
Revised: 11 November 1992
Issue Date: November 1994
DOI: https://doi.org/10.1007/BF01185427

Key words

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speeding up two string-matching algorithms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fast Exact Pattern Matching by the Means of a Character Bit Representation

Fast and Simple Computations Using Prefix Tables Under Hamming and Edit Distance

Fast Compressed Self-indexes with Deterministic Linear-Time Construction

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key words

Subscribe and save

Buy Now

Navigation

Speeding up two string-matching algorithms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fast Exact Pattern Matching by the Means of a Character Bit Representation

Fast and Simple Computations Using Prefix Tables Under Hamming and Edit Distance

Fast Compressed Self-indexes with Deterministic Linear-Time Construction

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Subscribe and save

Buy Now

Search

Navigation