Abstract
We present a new algorithm for string matching. The algorithm, called BNDM, is the bit-parallel simulation of a known (but recent) algorithm called BDM. BDM skips characters using a “suffix automaton” which is made deterministic in the preprocessing. BNDM, instead, simulates the nondeterministic version using bit-parallelism. This algorithm is 20%–25% faster than BDM, 2–3 times faster than other bit-parallel algorithms, and 10%–40% faster than all the Boyer-Moore family. This makes it the fastest algorithm in all cases except for very short or very long patterns (e.g. on English text it is the fastest between 5 and 110 characters). Moreover, the algorithm is very simple, allowing to easily implement other variants of BDM which are extremely complex in their original formulation. We show that, as other bit-parallel algorithms, BNDM can be extended to handle classes of characters in the pattern and in the text, multiple patterns and to allow errors in the pattern or in the text, combining simplicity, efficiency and flexibility. We also generalize the suffix automaton definition to handle classes of characters. To the best of our knowledge, this extension has not been studied before.
Partially supported by Chilean Fondecyt grant 1-950622.
Preview
Unable to display preview. Download preview PDF.
References
R. Baeza-Yates. Text retrieval: Theory and practice. In 12th IFIP World Computer Congress, volume I, pages 465–476. Elsevier Science, September 1992.
R. Baeza-Yates and G. Gonnet.A new approach to text searching.CALM, 35(10):74–82, October 1992.
R. Baeza-Yates and G. Navarro. A faster algorithm for approximate string matching. In Proc. of CPM'96, pages 1–23, 1996.
R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, pages 185–192. Springer-Verlag, 1992. LNCS 644.
A. Blumer, A. Ehrenfeucht, and D. Haussler. Average sizes of suffix trees and dawgs. Discrete Applied Mathematics, 24(1):37–45, 1989.
R. S. Boyer and J. S. Moore. A fast string searching algorithm. Communications of the ACM, 20(10):762–772, 1977.
W. Chang and J. Lampe. Theoretical and empirical comparisons of approximate string matching algorithms. In Proc. of CPM'92, pages 172–181, 1992. LNCS 644.
M. Crochemore. Transducers and repetitions. Theor. Comput. Sci., 45(1):63–86, 1986.
M. Crochemore, A. Czumaj, L. Gasieniec, S. Jarominek, T. Lecroq, W. Plandowski, and W. Rytter. Fast practical multi-pattern matching. Rapport 93-3, Institut Gaspard Monge, Université de Marne la Vallée, 1993.
M. Crochemore,A. Czumaj, L. Gasieniec, S. Jarominek, T. Lecroq, W. Plandowski, and W. Rytter. Speeding up two string-matching algorithms. Algorithmica, (12):247–267, 1994.
M. Crochemore and W. Rytter. Text algorithms. Oxford University Press, 1994.
R. N. Horspool. Practical fast searching in strings. Softw. Pratt. Exp., 10:501–506, 1980.
P. Jokinen, J. Tarhio, and E. Ukkonen. A comparison of approximate string matching algorithms. Software Practice and Experience, 26(12):1439–1458, 1996.
D. E. Knuth, J. H. Morris, Jr, and V. R. Pratt. Fast pattern matching in strings. SIAM Journal on Computing, 6(1):323–350, 1977.
T. Lecroq. Recherches de mot. Thèse de doctorat, Université d'Orléans, France, 1992.
G. Navarro. A partial deterministic automaton for approximate string matching. In Proc. of WSP'97, pages 112–124. Carleton University Press, 1997.
G. Navarro and M. Raffinot. A bit-parallel approach to suffix automata: Fast extended string matching. Technical Report TR/DCC-98-1, Dept. of Computer Science, Univ. of Chile, Jan 1998. ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/bndm.ps.gz.
M. Raffinot. Asymptotic estimation of the average number of terminal states in dawgs. In R. Baeza-Yates, editor, Proc. of WSP'97, pages 140–148, Valparaiso, Chile, November 12–13, 1997. Carleton University Press.
M. Raffinot. On the multi backward dawg matching algorithm (MultiBDM). In R. Baeza-Yates, editor, Proceedings of the 4rd South American Workshop on String Processing, pages 149–165, Valparaiso, Chile, November 12–13, 1997. Carleton University Press.
D. Sunday. A very fast substring search algorithm. CACM, 33(8):132–142, August 1990.
S. Wu and U. Manber. Agrep — a fast approximate pattern-matching tool. In Proc. of USENIX Technical Conference, pages 153–162, 1992.
S. Wu and U. Manber. Fast text searching allowing errors. CALM, 35(10):83–91, October 1992.
S. Wu, U. Manber, and E. Myers. A sub-quadratic algorithm for approximate limited expression matching. Algorithmica, 15(1):50–67, 1996.
A. C. Yao. The complexity of pattern matching for a random string. SIAM Journal on Computing, 8(3):368–387, 1979.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Navarro, G., Raffinot, M. (1998). A bit-parallel approach to suffix automata: Fast extended string matching. In: Farach-Colton, M. (eds) Combinatorial Pattern Matching. CPM 1998. Lecture Notes in Computer Science, vol 1448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0030778
Download citation
DOI: https://doi.org/10.1007/BFb0030778
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64739-3
Online ISBN: 978-3-540-69054-2
eBook Packages: Springer Book Archive