Skip to main content

A language approach to string searching evaluation

  • Conference paper
  • First Online:
Book cover Combinatorial Pattern Matching (CPM 1992)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 644))

Included in the following conference series:

Abstract

We propose a general framework to derive average performance of string searching algorithms that preprocess the pattern. It relies mainly on languages and combinatorics on words, joined to some probabilistic tools. The approach is quite powerful: although we concentrate here on Morris-Pratt and Boyer-Moore-Horspool, it applies to a large class of algorithms. A fairly general character distribution is assumed, namely a Markovian one, suitable for applications such as natural languages or biological databases searching. The average searching time, expressed as the number of text-pattern comparisons, is proven to be asymptotically Kn and the linearity constant is given.

This work was partially supported by the ESPRIT II Basic Research Actions Program of the EC under contract No. 3075 (project ALCOM)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. G. Barth. An analytical comparison of two string matching algorithms. IPL, 30:249–256, 1985.

    Google Scholar 

  2. R. Boyer and S. Moore. A fast string searching algorithm. CACM, 20:762–772, 1977.

    Google Scholar 

  3. R. Baeza-Yates. Efficient text searching. PhD Thesis CS-89-17, Univ. Waterloo, Canada, 1989.

    Google Scholar 

  4. R.A. Baeza-Yates. String Searching Algorithms Revisited. In WADS'89, volume 382 of Lecture Notes in Computer Science, pages 75–96. Springer-Verlag, 1989. Proc. WADS'89, Ottawa.

    Google Scholar 

  5. R. Baeza-Yates, G. Gonnet, and M. Régnier. Analysis of Boyer-Moore-type string searching algorithms. In SODA '90, pages 328–343. SIAM, 1990. Proc. Siam-ACM Symp. on Discrete Algorithms, San Francisco, USA.

    Google Scholar 

  6. R. Baeza-Yates and M. Régnier. Fast algorithms for two dimensional and multiple pattern matching. In SWAT'90, volume 447 of Lecture Notes in Computer Science, pages 332–347. Springer-Verlag, 1990. Proc. Swedish Workshop on Algorithm Theory, Bergen, Norway.

    Google Scholar 

  7. R. Baeza-Yates and M. Régnier. Average running time of Boyer-Moore-Horspool algorithm. Theoretical Computer Science, pages 19–31, 1992. special issue.

    Google Scholar 

  8. L. Colussi, Z. Galil, and R. Giancarlo. On the exact Complexity of string matching. In FOCS'90, pages 135–143. IEEE, 1990. Proc. 31-st Annual IEEE Symposium on the Foundations of Computer Science.

    Google Scholar 

  9. Samuel Eilenberg. Automata, Languages, and Machines, Volume A. Academic Press, 1974.

    Google Scholar 

  10. Ch. Hancart. Algorithme de Morris et Pratt et ses raffinements: une analyse en moyenne. Research report 91.56, Université de Paris VII, October, 1991.

    Google Scholar 

  11. R. N. Horspool. Practical fast searching in strings. Software-Practice and Experience, 10:501–506, 1980.

    Google Scholar 

  12. J. E. Hopcroft and J.D. Ullman. Introduction to Automata Theory. Addison Wesley, Reading, Mass, 1979.

    Google Scholar 

  13. D.E. Knuth, J. Morris, and V. Pratt. Fast pattern matching in strings. SIAM J. on Computing, 6:323–350, 1977.

    Google Scholar 

  14. R. Karp and M. Rabin. Efficient randomized pattern-matching algorithms. IBM J. Res. Development, 31:249–260, 1987.

    Google Scholar 

  15. Lothaire. Combinatorics on Words. Addison-Wesley, Reading, Mass., 1983.

    Google Scholar 

  16. M. Régnier. Knuth-Morris-Pratt algorithm: an analysis. In MFCS'89, volume 379 of Lecture Notes in Computer Science, pages 431–444. Springer-Verlag, 1989. Proc. Mathematical Foundations for Computer Science 89, Porubka, Poland.

    Google Scholar 

  17. M. Régnier. Performance of String Searching Algorithms under Various Probabilistic Models, 1991. submitted. also as INRIA Research Report 1565.

    Google Scholar 

  18. R. Schaback. On the Expected Sublinearity of the Boyer-Moore Algorithm. SIAM J. on Computing, 17:548–558, 1988.

    Google Scholar 

  19. K. Thompson. Regular expression search algorithm. CACM, 11:419–422, 1968.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alberto Apostolico Maxime Crochemore Zvi Galil Udi Manber

Rights and permissions

Reprints and permissions

Copyright information

© 1992 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Régnier, M. (1992). A language approach to string searching evaluation. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds) Combinatorial Pattern Matching. CPM 1992. Lecture Notes in Computer Science, vol 644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56024-6_2

Download citation

  • DOI: https://doi.org/10.1007/3-540-56024-6_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-56024-1

  • Online ISBN: 978-3-540-47357-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics