Abstract
The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. Our generalized Boyer-Moore algorithm solves the problem in expected time O(kn(1/(m − k)+k / c)) where c is the size of the alphabet. A related algorithm is developed for the k differences problem where the task is to find all approximate occurrences of a pattern in a text with ≤ k differences (insertions, deletions, changes).
Extended Abstract
Preview
Unable to display preview. Download preview PDF.
References
R. Baeza-Yates: Efficient Text Searching. Ph.D. Thesis, Report CS-89-17, University of Waterloo, Computer Science Department, 1989.
R. Baeza-Yates: String searching algorithms revisited. In: Proceedings of the Workshop on Algorithms and Data Structures (ed. F. Dehne et al.), Lecture Notes in Computer Science 382, Springer-Verlag, Berlin, 1989, 75–96.
R. Boyer and S. Moore: A fast string searching algorithm. Communcations of the ACM 20 (1977), 762–772.
Z. Galil and R. Giancarlo: Improved string matching with k mismatches. SIGACT News 17 (1986), 52–54.
Z. Galil and R. Giancarlo: Data structures and algorithms for approximate string matching. Journal of Complexity 4 (1988), 33–72.
Z. Galil and K. Park: An improved algorithm for approximate string matching. Proceedings of the 16th International Colloquium on Automata, Languages and Programming, Lecture Notes in Computer Science 372, Springer-Verlag, Berlin, 1989, 394–404.
R. Grossi and F. Luccio: Simple and efficient string matching with k mismatches. Information Processing Letters 33 (1989), 113–120.
N. Horspool: Practical fast searching in strings. Software Practice & Experience 10 (1980), 501–506.
P. Jokinen, J. Tarhio and E. Ukkonen: A comparison of approximate string matching algorithms. In preparation.
S. R. Kosaraju: Efficient string matching. Extended abstract. Johns Hopkins University, 1988.
D. Knuth, J. Morris and V. Pratt: Fast pattern matching in strings. SIAM Journal on Computing 6 (1977), 323–350.
G. Landau and U. Vishkin: Fast string matching witk k differences. Journal of Computer and System Sciences 37 (1988), 63–78.
G. Landau and U. Vishkin: Fast parallel and serial approximate string matching. Journal of Algorithms 10 (1989), 157–169.
P. Sellers: The theory and computation of evolutionary distances: Pattern recognition. Journal of Algorithms 1 (1980), 359–372.
J. Tarhio and E. Ukkonen: Approximate Boyer-Moore string matching. Report A-1990-3. Department of Computer Science, University of Helsinki, 1990.
E. Ukkonen: Algorithms for approximate string matching. Information Control 64 (1985), 100–118.
E. Ukkonen: Finding approximate patterns in strings. Journal of Algorithms 6 (1985), 132–137.
E. Ukkonen and D. Wood: Fast approximate string matching with suffix automata. Manuscript, 1989.
R. Wagner and M. Fischer: The string-to-string correction problem. Journal of the ACM 21 (1975), 168–173.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1990 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tarhio, J., Ukkonen, E. (1990). Boyer-Moore approach to approximate string matching. In: Gilbert, J.R., Karlsson, R. (eds) SWAT 90. SWAT 1990. Lecture Notes in Computer Science, vol 447. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-52846-6_103
Download citation
DOI: https://doi.org/10.1007/3-540-52846-6_103
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-52846-3
Online ISBN: 978-3-540-47164-6
eBook Packages: Springer Book Archive