Abstract
The problem of finding all approximate occurrences P′ of a pattern string P in a text string T such that the edit distance between P and P′ is ≤k is considered. We concentrate on a scheme in which T is first preprocessed to make the subsequent searches with different P fast. Two preprocessing methods and the corresponding search algorithms are described. The first is based suffix automata and is applicable for edit distances with general edit operation costs. The second is a special design for unit cost edit distance and is based on q-gram lists. The preprocessing needs in both cases time and space O(|T|). The search algorithms run in the worst case in time O(|P||T|) or O(k|T|), and in the best case in time O(|P|).
(Extended Abstract)
Research supported by the Academy of Finland and by the Alexander von Humboldt Foundation (Germany). The work of the second author was in part carried out when visiting Institut fuer Informatik, University of Freiburg, Germany.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A., Chen, M.T. and Seiferas, J. (1985): The smallest automaton recognizing the subwords of a text. Theor. Comp. Sci. 40, 31–55.
Chang,W. and Lawler,E (1990): Approximate string matching in sublinear expected time. FOCS'90, pp. 116–124.
Crochemore, M. (1986): Transducers and repetitions. Theor. Comp. Sci. 45, 63–86.
Crochemore, M. (1988): String matching with constraints. Proc. MFCS'88. SLNCS 324, pp. 44–58.
Dowling, G. R. & Hall, P. (1980): Approximate string matching. ACM Comput. Surv. 12, 381–402.
Galil, Z. & Giancarlo, R. (1988): Data structures and algorithms for approximate string matching. J. Complexity 4, 33–72.
Galil, Z. & Park, K. (1989): An improved algorithm for approximate string matching. ICALP'89. SLNCS 372, pp. 394–404.
Karp, R.M. and Rabin, M.O. (1987): Efficient randomized pattern matching. IBM J. Res. Dev. 31, 249–260.
Kohonen,T. & Reuhkala,E. (1978): A very fast associative method for the recognition and correction of misspellt words, based on redundant hash-addressing. Proc. 4th Int. Joint Conf. on Pattern Recognition, 1978, Kyoto, Japan, pp. 807–809.
Landau, G. & Vishkin, U. (1988): Fast string matching with k differences. JCSS 37, 63–78. (Also 26th FOCS, pp. 126–136).
Manber, U. & Myers, G. (1990): Suffix arrays: a new method for on-line string searches. SODA'90, pp. 319–327.
McCreight, E. M. (1976): A space economical suffix tree construction algorithm. J. ACM 23, 262–272.
Owolabi, O. & McGregor, D. R. (1988): Fast approximate string matching. Software — Practice and Experience 18(4), 387–393.
Tarhio, J. & Ukkonen, E. (1990): Boyer-Moore approach to approximate string matching. 2nd Scand. Workshop on Algorithm Theory (SWAT90), SLNCS 447, pp. 348–359.
Ukkonen, E. (1991): Approximate string matching with q-grams and maximal matches. Theor. Comp. Sci., to appear.
Ukkonen, E. & Wood, D. (1990): Approximate string matching with suffix automata. Report A-1990-4. Department of Computer Science, University of Helsinki.
Weiner, P. (1973): Linear pattern matching algorithms. Proc. 14th IEEE Symp. Switching and Automata Theory, pp. 1–11.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1991 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jokinen, P., Ukkonen, E. (1991). Two algorithms for approxmate string matching in static texts. In: Tarlecki, A. (eds) Mathematical Foundations of Computer Science 1991. MFCS 1991. Lecture Notes in Computer Science, vol 520. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-54345-7_67
Download citation
DOI: https://doi.org/10.1007/3-540-54345-7_67
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-54345-9
Online ISBN: 978-3-540-47579-8
eBook Packages: Springer Book Archive