Two algorithms for approxmate string matching in static texts

Jokinen, Petteri; Ukkonen, Esko

doi:10.1007/3-540-54345-7_67

Petteri Jokinen¹ &
Esko Ukkonen¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 520))

Included in the following conference series:

International Symposium on Mathematical Foundations of Computer Science

360 Accesses
46 Citations

Abstract

The problem of finding all approximate occurrences P′ of a pattern string P in a text string T such that the edit distance between P and P′ is ≤k is considered. We concentrate on a scheme in which T is first preprocessed to make the subsequent searches with different P fast. Two preprocessing methods and the corresponding search algorithms are described. The first is based suffix automata and is applicable for edit distances with general edit operation costs. The second is a special design for unit cost edit distance and is based on q-gram lists. The preprocessing needs in both cases time and space O(|T|). The search algorithms run in the worst case in time O(|P||T|) or O(k|T|), and in the best case in time O(|P|).

(Extended Abstract)

Research supported by the Academy of Finland and by the Alexander von Humboldt Foundation (Germany). The work of the second author was in part carried out when visiting Institut fuer Informatik, University of Freiburg, Germany.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Online Algorithms for Finding Distinct Substrings with Length and Multiple Prefix and Suffix Conditions

Extracting the Sparse Longest Common Prefix Array from the Suffix Binary Search Tree

Consecutive Occurrences with Distance Constraints

References

Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A., Chen, M.T. and Seiferas, J. (1985): The smallest automaton recognizing the subwords of a text. Theor. Comp. Sci. 40, 31–55.
Google Scholar
Chang,W. and Lawler,E (1990): Approximate string matching in sublinear expected time. FOCS'90, pp. 116–124.
Google Scholar
Crochemore, M. (1986): Transducers and repetitions. Theor. Comp. Sci. 45, 63–86.
Google Scholar
Crochemore, M. (1988): String matching with constraints. Proc. MFCS'88. SLNCS 324, pp. 44–58.
Google Scholar
Dowling, G. R. & Hall, P. (1980): Approximate string matching. ACM Comput. Surv. 12, 381–402.
Google Scholar
Galil, Z. & Giancarlo, R. (1988): Data structures and algorithms for approximate string matching. J. Complexity 4, 33–72.
Google Scholar
Galil, Z. & Park, K. (1989): An improved algorithm for approximate string matching. ICALP'89. SLNCS 372, pp. 394–404.
Google Scholar
Karp, R.M. and Rabin, M.O. (1987): Efficient randomized pattern matching. IBM J. Res. Dev. 31, 249–260.
Google Scholar
Kohonen,T. & Reuhkala,E. (1978): A very fast associative method for the recognition and correction of misspellt words, based on redundant hash-addressing. Proc. 4th Int. Joint Conf. on Pattern Recognition, 1978, Kyoto, Japan, pp. 807–809.
Google Scholar
Landau, G. & Vishkin, U. (1988): Fast string matching with k differences. JCSS 37, 63–78. (Also 26th FOCS, pp. 126–136).
Google Scholar
Manber, U. & Myers, G. (1990): Suffix arrays: a new method for on-line string searches. SODA'90, pp. 319–327.
Google Scholar
McCreight, E. M. (1976): A space economical suffix tree construction algorithm. J. ACM 23, 262–272.
Google Scholar
Owolabi, O. & McGregor, D. R. (1988): Fast approximate string matching. Software — Practice and Experience 18(4), 387–393.
Google Scholar
Tarhio, J. & Ukkonen, E. (1990): Boyer-Moore approach to approximate string matching. 2nd Scand. Workshop on Algorithm Theory (SWAT90), SLNCS 447, pp. 348–359.
Google Scholar
Ukkonen, E. (1991): Approximate string matching with q-grams and maximal matches. Theor. Comp. Sci., to appear.
Google Scholar
Ukkonen, E. & Wood, D. (1990): Approximate string matching with suffix automata. Report A-1990-4. Department of Computer Science, University of Helsinki.
Google Scholar
Weiner, P. (1973): Linear pattern matching algorithms. Proc. 14th IEEE Symp. Switching and Automata Theory, pp. 1–11.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Helsinki, Teollisuuskatu 23, SF-00510, Helsinki, Finland
Petteri Jokinen & Esko Ukkonen

Authors

Petteri Jokinen
View author publications
You can also search for this author in PubMed Google Scholar
Esko Ukkonen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Andrzej Tarlecki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jokinen, P., Ukkonen, E. (1991). Two algorithms for approxmate string matching in static texts. In: Tarlecki, A. (eds) Mathematical Foundations of Computer Science 1991. MFCS 1991. Lecture Notes in Computer Science, vol 520. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-54345-7_67

Download citation

DOI: https://doi.org/10.1007/3-540-54345-7_67
Published: 08 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-54345-9
Online ISBN: 978-3-540-47579-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Two algorithms for approxmate string matching in static texts

Abstract

Access this chapter

Preview

Similar content being viewed by others

Online Algorithms for Finding Distinct Substrings with Length and Multiple Prefix and Suffix Conditions

Extracting the Sparse Longest Common Prefix Array from the Suffix Binary Search Tree

Consecutive Occurrences with Distance Constraints

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Two algorithms for approxmate string matching in static texts

Abstract

Access this chapter

Preview

Similar content being viewed by others

Online Algorithms for Finding Distinct Substrings with Length and Multiple Prefix and Suffix Conditions

Extracting the Sparse Longest Common Prefix Array from the Suffix Binary Search Tree

Consecutive Occurrences with Distance Constraints

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation