Skip to main content

A bit-parallel approach to suffix automata: Fast extended string matching

  • Session I
  • Conference paper
  • First Online:
Book cover Combinatorial Pattern Matching (CPM 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1448))

Included in the following conference series:

Abstract

We present a new algorithm for string matching. The algorithm, called BNDM, is the bit-parallel simulation of a known (but recent) algorithm called BDM. BDM skips characters using a “suffix automaton” which is made deterministic in the preprocessing. BNDM, instead, simulates the nondeterministic version using bit-parallelism. This algorithm is 20%–25% faster than BDM, 2–3 times faster than other bit-parallel algorithms, and 10%–40% faster than all the Boyer-Moore family. This makes it the fastest algorithm in all cases except for very short or very long patterns (e.g. on English text it is the fastest between 5 and 110 characters). Moreover, the algorithm is very simple, allowing to easily implement other variants of BDM which are extremely complex in their original formulation. We show that, as other bit-parallel algorithms, BNDM can be extended to handle classes of characters in the pattern and in the text, multiple patterns and to allow errors in the pattern or in the text, combining simplicity, efficiency and flexibility. We also generalize the suffix automaton definition to handle classes of characters. To the best of our knowledge, this extension has not been studied before.

Partially supported by Chilean Fondecyt grant 1-950622.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Baeza-Yates. Text retrieval: Theory and practice. In 12th IFIP World Computer Congress, volume I, pages 465–476. Elsevier Science, September 1992.

    Google Scholar 

  2. R. Baeza-Yates and G. Gonnet.A new approach to text searching.CALM, 35(10):74–82, October 1992.

    Google Scholar 

  3. R. Baeza-Yates and G. Navarro. A faster algorithm for approximate string matching. In Proc. of CPM'96, pages 1–23, 1996.

    Google Scholar 

  4. R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, pages 185–192. Springer-Verlag, 1992. LNCS 644.

    Google Scholar 

  5. A. Blumer, A. Ehrenfeucht, and D. Haussler. Average sizes of suffix trees and dawgs. Discrete Applied Mathematics, 24(1):37–45, 1989.

    Google Scholar 

  6. R. S. Boyer and J. S. Moore. A fast string searching algorithm. Communications of the ACM, 20(10):762–772, 1977.

    Google Scholar 

  7. W. Chang and J. Lampe. Theoretical and empirical comparisons of approximate string matching algorithms. In Proc. of CPM'92, pages 172–181, 1992. LNCS 644.

    Google Scholar 

  8. M. Crochemore. Transducers and repetitions. Theor. Comput. Sci., 45(1):63–86, 1986.

    Google Scholar 

  9. M. Crochemore, A. Czumaj, L. Gasieniec, S. Jarominek, T. Lecroq, W. Plandowski, and W. Rytter. Fast practical multi-pattern matching. Rapport 93-3, Institut Gaspard Monge, Université de Marne la Vallée, 1993.

    Google Scholar 

  10. M. Crochemore,A. Czumaj, L. Gasieniec, S. Jarominek, T. Lecroq, W. Plandowski, and W. Rytter. Speeding up two string-matching algorithms. Algorithmica, (12):247–267, 1994.

    Google Scholar 

  11. M. Crochemore and W. Rytter. Text algorithms. Oxford University Press, 1994.

    Google Scholar 

  12. R. N. Horspool. Practical fast searching in strings. Softw. Pratt. Exp., 10:501–506, 1980.

    Google Scholar 

  13. P. Jokinen, J. Tarhio, and E. Ukkonen. A comparison of approximate string matching algorithms. Software Practice and Experience, 26(12):1439–1458, 1996.

    Google Scholar 

  14. D. E. Knuth, J. H. Morris, Jr, and V. R. Pratt. Fast pattern matching in strings. SIAM Journal on Computing, 6(1):323–350, 1977.

    Google Scholar 

  15. T. Lecroq. Recherches de mot. Thèse de doctorat, Université d'Orléans, France, 1992.

    Google Scholar 

  16. G. Navarro. A partial deterministic automaton for approximate string matching. In Proc. of WSP'97, pages 112–124. Carleton University Press, 1997.

    Google Scholar 

  17. G. Navarro and M. Raffinot. A bit-parallel approach to suffix automata: Fast extended string matching. Technical Report TR/DCC-98-1, Dept. of Computer Science, Univ. of Chile, Jan 1998. ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/bndm.ps.gz.

    Google Scholar 

  18. M. Raffinot. Asymptotic estimation of the average number of terminal states in dawgs. In R. Baeza-Yates, editor, Proc. of WSP'97, pages 140–148, Valparaiso, Chile, November 12–13, 1997. Carleton University Press.

    Google Scholar 

  19. M. Raffinot. On the multi backward dawg matching algorithm (MultiBDM). In R. Baeza-Yates, editor, Proceedings of the 4rd South American Workshop on String Processing, pages 149–165, Valparaiso, Chile, November 12–13, 1997. Carleton University Press.

    Google Scholar 

  20. D. Sunday. A very fast substring search algorithm. CACM, 33(8):132–142, August 1990.

    Google Scholar 

  21. S. Wu and U. Manber. Agrep — a fast approximate pattern-matching tool. In Proc. of USENIX Technical Conference, pages 153–162, 1992.

    Google Scholar 

  22. S. Wu and U. Manber. Fast text searching allowing errors. CALM, 35(10):83–91, October 1992.

    Google Scholar 

  23. S. Wu, U. Manber, and E. Myers. A sub-quadratic algorithm for approximate limited expression matching. Algorithmica, 15(1):50–67, 1996.

    Google Scholar 

  24. A. C. Yao. The complexity of pattern matching for a random string. SIAM Journal on Computing, 8(3):368–387, 1979.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Martin Farach-Colton

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Navarro, G., Raffinot, M. (1998). A bit-parallel approach to suffix automata: Fast extended string matching. In: Farach-Colton, M. (eds) Combinatorial Pattern Matching. CPM 1998. Lecture Notes in Computer Science, vol 1448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0030778

Download citation

  • DOI: https://doi.org/10.1007/BFb0030778

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64739-3

  • Online ISBN: 978-3-540-69054-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics