Skip to main content

A unified view to string matching algorithms

  • Invited Papers
  • Conference paper
  • First Online:
Book cover SOFSEM'96: Theory and Practice of Informatics (SOFSEM 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1175))

Abstract

We present a unified view to sequential algorithms for many pattern matching problems, using a finite automaton built from the pattern which uses the text as input. We show the limitations of deterministic finite automata (DFA) and the advantages of using a bitwise simulation of non-deterministic finite automata (NFA). This approach gives very fast practical algorithms which have good complexity for small patterns on a RAM machine with word length O(log n), where n is the size of the text. For generalized string matching the time complexity is O(mn/log n) which for small patterns is linear. For approximate string matching we show that the two main known approaches to the problem are variations of the NFA simulation. For this case we present a different simulation technique which gives a running time of O(n) independently of the maximum number of errors allowed, k, for small patterns. This algorithm improves the best bit-wise or comparison based algorithms of running time O(kn) and can be used as a basic block for algorithms with good average case behavior. We also formalize previous bit-wise simulation of general NFAs achieving O(mn log log n/log n) time.

This work was partially funded by Fondecyt Chilean Grant 95-0622.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. K. Abrahamson. Generalized string matching. SIAM J on Computing, 16:1039–1051, 1987.

    Google Scholar 

  2. A. Anderson, T. Hagerup, S. Nilsson, and R. Rajeev. Sorting in linear time? In STOC'95, pages 427–436, Las Vegas, NE, 1995.

    Google Scholar 

  3. R. Baeza-Yates. Searching subsequences (note). Theoretical Computer Science, 78:363–376, 1991.

    Google Scholar 

  4. R. Baeza-Yates and G.H. Gonnet. A new approach to text searching. Communications of the ACM, 35:74–82, Oct 1992.

    Google Scholar 

  5. R.A. Baeza-Yates and C.H. Perleberg. Fast and practical approximate pattern matching. In A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, editors, Combinatorial Pattern Matching, Lecture Notes in Computer Science 644, pages 185–192, Tucson, AZ, April/May 1992. Springer Verlag.

    Google Scholar 

  6. R. Baeza-Yates. Text retrieval: Theory and practice. In J. van Leeuwen, editor, 12th IFIP World Computer Congress, Volume I, volume Algorithms, Software, Architecture, pages 465–476, Madrid, Spain, September 1992. Elsevier Science.

    Google Scholar 

  7. R.A. Baeza-Yates, C. Choffrut, and G.H. Gonnet. On Boyer-Moore automata. Algorithmica, 12:268–292, 1994.

    Google Scholar 

  8. R. Baeza-Yates. A unified view of pattern matching problems. Technical report, Dept. of Computer Science, Univ. of Chile, 1995.

    Google Scholar 

  9. R. Baeza-Yates and G. Navarro. A faster algorithm for approximate string matching. In Combinatorial Pattern Matching (CPM'96), Irvine, CA, Jun 1996. ftp//sunsite.dcc.uchile.cl/pub/users/gnavarro/cpm96.ps.gz.

    Google Scholar 

  10. R. Baeza-Yates and G. Navarro. A fast heuristic for approximate string matching. In Third South American Workshop on String Processing, pages 47–63, Recife, Brazil, August 1996. ftp//sunsite.dcc.uchile.cl/pub/users/gnavarro/wsp96.2.ps.gz.

    Google Scholar 

  11. R. Boyer and S. Moore. A fast string searching algorithm. C.ACM, 20:762–772, 1977.

    Google Scholar 

  12. V. Bruyere, R. Baeza-Yates, O. Delgrange, and R. Scheihing. On the size of Boyer-Moore automata. In Third South American Workshop on String Processing, pages 31–46, Recife, Brazil, August 1996.

    Google Scholar 

  13. A Dermouche. A fast algorithm for string matching with mismatches. Information Processing Letters, 55(1):105–110, July 1995.

    Google Scholar 

  14. M. Fischer and M. Paterson. String matching and other products. In R. Karp. editor, Complexity of Computation (SIAM-AMS Proceedings 7), volume 7, pages 113–125. American Mathematical Society, Providence, RI, 1974.

    Google Scholar 

  15. M. Fredman and D. Willard. Surpassing the information theoretic bound with fusion trees. J. Comput. System Sci., 47:424–436, 1993.

    Google Scholar 

  16. G.H. Gonnet and R. Baeza-Yates. Handbook of Algorithms and Data Structures — In Pascal and C. Addison-Wesley, Wokingham, UK, 1991. (second edition).

    Google Scholar 

  17. Z. Galil and K. Park. An improved algorithm for approximate string matching. SIAM J. on Computing, 19(6):989–999, 1990.

    Google Scholar 

  18. C Hancart. Analyse Exacte et en Moyenne d'Algorithmes de Recherche d'un Mot dans un Texte. PhD thesis, Universite Paris 7, Paris, France, 1993.

    Google Scholar 

  19. H. Karloff. Fast algorithms for approximately counting mismatches. Information Processing Letters, 48:53–60, 1993.

    Google Scholar 

  20. D.E. Knuth, J. Morris, and V. Pratt. Fast pattern matching in strings SIAM J on Computing 6:323–350, 1977.

    Google Scholar 

  21. S.R. Kosaraju. Efficient string matching. Manuscript, Johns Hopkins University, 1987.

    Google Scholar 

  22. G. Landau and U. Vishkin. Fast string matching with k differences. JCSS, 37:63–78, 1988.

    Google Scholar 

  23. U. Manber and R. Baeza-Yates. An algorithm for string matching with a sequence of don't cares. Information Processing Letters, 37:133–136, February 1991.

    Google Scholar 

  24. B. Melichar. Approximate string matching by finite automata. In Conf. on Analysis of Images and Patterns, number 970 in LNCS, pages 342–349, Prague, Check Republic, 1995. Springer-Verlag.

    Google Scholar 

  25. E. Myers and W. Miller. Approximate matching of regular expressions. Bulletin of Mathematical Biology, 51(1):5–37, 1989.

    Google Scholar 

  26. E. Myers. A four-russians algorithm for regular expression pattern matching. JACM, 39(2):430–448, 1992.

    Google Scholar 

  27. E. Myers. A sublinear algorithm for approximate keyword searching. Algorithmica, 12(4/5):345–374, Oct/Nov 1994.

    Google Scholar 

  28. R. Pinter. Efficient string matching with don't-care patterns. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, volume F12 of NATO ASI Series, pages 11–29. Springer-Verlag, 1985.

    Google Scholar 

  29. I. Simon. String matching algorithms and automata. In First South American Workshop on String Processing, pages 151–157, Belo Horizonte, Brazil. 1993.

    Google Scholar 

  30. K. Thompson. Regular expression search algorithm. C.ACM, 11:419–422, 1968.

    Google Scholar 

  31. E. Ukkonen. Algorithms for approximate string matching. Information and Control, 64:100–118, 1985.

    Google Scholar 

  32. E. Ukkonen. Finding approximate patterns in strings. J. of Algorithms, 6:132–137, 1985.

    Google Scholar 

  33. S. Wu and U. Manber. Fast text searching allowing errors. Communications of the ACM, 35:83–91, Oct 1992.

    Google Scholar 

  34. S. Wu, U. Manber, and E. Myers. A subquadratic algorithm for approximate regular expression matching. Journal of Algorithms, 19:346–360, 1995.

    Google Scholar 

  35. A. Wright. Approximate string matching using within-word parallelism. Software Practice and Experience, 24(4):337–362, April 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Keith G. Jeffery Jaroslav Král Miroslav Bartošek

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Baeza-Yates, R. (1996). A unified view to string matching algorithms. In: Jeffery, K.G., Král, J., Bartošek, M. (eds) SOFSEM'96: Theory and Practice of Informatics. SOFSEM 1996. Lecture Notes in Computer Science, vol 1175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0037393

Download citation

  • DOI: https://doi.org/10.1007/BFb0037393

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-61994-9

  • Online ISBN: 978-3-540-49588-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics