Abstract
We present a unified view to sequential algorithms for many pattern matching problems, using a finite automaton built from the pattern which uses the text as input. We show the limitations of deterministic finite automata (DFA) and the advantages of using a bitwise simulation of non-deterministic finite automata (NFA). This approach gives very fast practical algorithms which have good complexity for small patterns on a RAM machine with word length O(log n), where n is the size of the text. For generalized string matching the time complexity is O(mn/log n) which for small patterns is linear. For approximate string matching we show that the two main known approaches to the problem are variations of the NFA simulation. For this case we present a different simulation technique which gives a running time of O(n) independently of the maximum number of errors allowed, k, for small patterns. This algorithm improves the best bit-wise or comparison based algorithms of running time O(kn) and can be used as a basic block for algorithms with good average case behavior. We also formalize previous bit-wise simulation of general NFAs achieving O(mn log log n/log n) time.
This work was partially funded by Fondecyt Chilean Grant 95-0622.
Preview
Unable to display preview. Download preview PDF.
References
K. Abrahamson. Generalized string matching. SIAM J on Computing, 16:1039–1051, 1987.
A. Anderson, T. Hagerup, S. Nilsson, and R. Rajeev. Sorting in linear time? In STOC'95, pages 427–436, Las Vegas, NE, 1995.
R. Baeza-Yates. Searching subsequences (note). Theoretical Computer Science, 78:363–376, 1991.
R. Baeza-Yates and G.H. Gonnet. A new approach to text searching. Communications of the ACM, 35:74–82, Oct 1992.
R.A. Baeza-Yates and C.H. Perleberg. Fast and practical approximate pattern matching. In A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, editors, Combinatorial Pattern Matching, Lecture Notes in Computer Science 644, pages 185–192, Tucson, AZ, April/May 1992. Springer Verlag.
R. Baeza-Yates. Text retrieval: Theory and practice. In J. van Leeuwen, editor, 12th IFIP World Computer Congress, Volume I, volume Algorithms, Software, Architecture, pages 465–476, Madrid, Spain, September 1992. Elsevier Science.
R.A. Baeza-Yates, C. Choffrut, and G.H. Gonnet. On Boyer-Moore automata. Algorithmica, 12:268–292, 1994.
R. Baeza-Yates. A unified view of pattern matching problems. Technical report, Dept. of Computer Science, Univ. of Chile, 1995.
R. Baeza-Yates and G. Navarro. A faster algorithm for approximate string matching. In Combinatorial Pattern Matching (CPM'96), Irvine, CA, Jun 1996. ftp//sunsite.dcc.uchile.cl/pub/users/gnavarro/cpm96.ps.gz.
R. Baeza-Yates and G. Navarro. A fast heuristic for approximate string matching. In Third South American Workshop on String Processing, pages 47–63, Recife, Brazil, August 1996. ftp//sunsite.dcc.uchile.cl/pub/users/gnavarro/wsp96.2.ps.gz.
R. Boyer and S. Moore. A fast string searching algorithm. C.ACM, 20:762–772, 1977.
V. Bruyere, R. Baeza-Yates, O. Delgrange, and R. Scheihing. On the size of Boyer-Moore automata. In Third South American Workshop on String Processing, pages 31–46, Recife, Brazil, August 1996.
A Dermouche. A fast algorithm for string matching with mismatches. Information Processing Letters, 55(1):105–110, July 1995.
M. Fischer and M. Paterson. String matching and other products. In R. Karp. editor, Complexity of Computation (SIAM-AMS Proceedings 7), volume 7, pages 113–125. American Mathematical Society, Providence, RI, 1974.
M. Fredman and D. Willard. Surpassing the information theoretic bound with fusion trees. J. Comput. System Sci., 47:424–436, 1993.
G.H. Gonnet and R. Baeza-Yates. Handbook of Algorithms and Data Structures — In Pascal and C. Addison-Wesley, Wokingham, UK, 1991. (second edition).
Z. Galil and K. Park. An improved algorithm for approximate string matching. SIAM J. on Computing, 19(6):989–999, 1990.
C Hancart. Analyse Exacte et en Moyenne d'Algorithmes de Recherche d'un Mot dans un Texte. PhD thesis, Universite Paris 7, Paris, France, 1993.
H. Karloff. Fast algorithms for approximately counting mismatches. Information Processing Letters, 48:53–60, 1993.
D.E. Knuth, J. Morris, and V. Pratt. Fast pattern matching in strings SIAM J on Computing 6:323–350, 1977.
S.R. Kosaraju. Efficient string matching. Manuscript, Johns Hopkins University, 1987.
G. Landau and U. Vishkin. Fast string matching with k differences. JCSS, 37:63–78, 1988.
U. Manber and R. Baeza-Yates. An algorithm for string matching with a sequence of don't cares. Information Processing Letters, 37:133–136, February 1991.
B. Melichar. Approximate string matching by finite automata. In Conf. on Analysis of Images and Patterns, number 970 in LNCS, pages 342–349, Prague, Check Republic, 1995. Springer-Verlag.
E. Myers and W. Miller. Approximate matching of regular expressions. Bulletin of Mathematical Biology, 51(1):5–37, 1989.
E. Myers. A four-russians algorithm for regular expression pattern matching. JACM, 39(2):430–448, 1992.
E. Myers. A sublinear algorithm for approximate keyword searching. Algorithmica, 12(4/5):345–374, Oct/Nov 1994.
R. Pinter. Efficient string matching with don't-care patterns. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, volume F12 of NATO ASI Series, pages 11–29. Springer-Verlag, 1985.
I. Simon. String matching algorithms and automata. In First South American Workshop on String Processing, pages 151–157, Belo Horizonte, Brazil. 1993.
K. Thompson. Regular expression search algorithm. C.ACM, 11:419–422, 1968.
E. Ukkonen. Algorithms for approximate string matching. Information and Control, 64:100–118, 1985.
E. Ukkonen. Finding approximate patterns in strings. J. of Algorithms, 6:132–137, 1985.
S. Wu and U. Manber. Fast text searching allowing errors. Communications of the ACM, 35:83–91, Oct 1992.
S. Wu, U. Manber, and E. Myers. A subquadratic algorithm for approximate regular expression matching. Journal of Algorithms, 19:346–360, 1995.
A. Wright. Approximate string matching using within-word parallelism. Software Practice and Experience, 24(4):337–362, April 1994.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baeza-Yates, R. (1996). A unified view to string matching algorithms. In: Jeffery, K.G., Král, J., Bartošek, M. (eds) SOFSEM'96: Theory and Practice of Informatics. SOFSEM 1996. Lecture Notes in Computer Science, vol 1175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0037393
Download citation
DOI: https://doi.org/10.1007/BFb0037393
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61994-9
Online ISBN: 978-3-540-49588-8
eBook Packages: Springer Book Archive