Skip to main content

Approximate string-matching over suffix trees

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 684))

Abstract

The classical approximate string-matching problem of finding the locations of approximate occurrences P′ of pattern string P in text string T such that the edit distance between P and P′ is ≤ k is considered. We concentrate on the special case in which T is available for preprocessing before the searches with varying P and k. It is shown how the searches can be done fast using the suffix tree of T augmented with the suffix links as the preprocessed form of T and applying dynamic programming over the tree. Three variations of the search algorithm are developed with running times O(mq + n), O(mq log q + size of the output), and O(m 2 q + size of the output). Here n = ¦T¦, m = ¦P¦, and q varies depending on the problem instance between 0 and n. In the case of the unit cost edit distance it is shown that q = O(min(n, m k+1¦∑¦k)) where is the alphabet.

This work was supported by the Academy of Finland and by the Alexander von Humboldt Foundation (Germany).

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S., Gish, W., Miller, W., Myers, E. & Lipman, D. (1990): A basic local alignment search tool. J. of Molecular Biology 215, 403–410.

    Google Scholar 

  2. Baeza-Yates, R. A. & Gonnet, G. H.: All-against-all sequence matching (Extended Abstract).

    Google Scholar 

  3. Blumer,A., Blumer,J., Haussler, D., Ehrenfeucht, A., Chen, M.T. and Seiferas, J. (1985): The smallest automaton recognizing the subwords of a text. Theor. Comp. Sci. 40, 31–55.

    Google Scholar 

  4. Chang, W. & Lampe, J. (1992): Theoretical and empirical comparisons of approximate string matching algorithms. Proc. Combinatorial Pattern Matching 1992, (Tucson, April 1992), Lect. Notes in Computer Science 644 (Springer-Verlag 1992), pp. 175–184.

    Google Scholar 

  5. Chang, W. & Lawler, E (1990): Approximate string matching in sublinear expected time. Proc. IEEE 1990 Ann. Symp. on Foundations of Computer Science, pp. 116–124.

    Google Scholar 

  6. Crochemore, M. (1986): Transducers and repetitions. Theor. Comp. Sci. 45, 63–86.

    Google Scholar 

  7. Crochemore, M. (1988): String matching with constraints. Proc. MFCS'88 Symposium. Lect. Notes in Computer Science 324 (Springer-Verlag 1988), pp. 44–58.

    Google Scholar 

  8. Dowling, G. R. & Hall, P. (1980): Approximate string matching. ACM Comput. Surv. 12, 381–402.

    Google Scholar 

  9. Galil, Z. & Giancarlo, R. (1988): Data structures and algorithms for approximate string matching. J. Complexity 4, 33–72.

    Google Scholar 

  10. Galil, Z. & Park, K. (1989): An improved algorithm for approximate string matching. SIAM J. on Computing 19, 989–999.

    Google Scholar 

  11. Gonnet, G. H. (1992): A tutorial introduction to Computational Biochemistry using Darwin. Informatik E. T. H. Zuerich, Switzerland.

    Google Scholar 

  12. Gonnet, G.H., Baeza-Yates,R.A. & Snider, T. (1991): Lexicographical indices for text: Inverted files vs. PAT trees. Report OED-91-01, UW Centre for the New Oxford English Dictionary and Text Research, 1991.

    Google Scholar 

  13. Jokinen, P. & Ukkonen, E. (1991): Two-algorithms for approximate string matching in static texts. Proc. MFCS'91, Lect. Notes in Computer Science 520 (Springer-Verlag 1991), pp. 240–248.

    Google Scholar 

  14. Landau, G. & Vishkin, U. (1988): Fast string matching with k differences. J. Comp. Syst. Sci. 37, 63–78.

    Google Scholar 

  15. Manber, U. & Myers, G. (1990): Suffix arrays: A new method for on-line string searches. In: SODA-90, pp. 319–327.

    Google Scholar 

  16. McCreight, E. M. (1976): A space economical suffix tree construction algorithm. J. ACM 23, 262–272.

    Google Scholar 

  17. Myers, E. W.: A sublinear algorithm for approximate keyword searching. TR 90-25, Department of Computer Science, The Univ. of Arizona, Tucson (to appear in Algorithmica).

    Google Scholar 

  18. Sellers, P. H. (1980): The theory and computation of evolutionary distances: Pattern recognition. J. Algorithms 1, 359–373.

    Google Scholar 

  19. Tarhio, J. & Ukkonen, E. (1990): Boyer-Moore approach to approximate string matching. 2nd Scand. Workshop on Algorithm Theory, Lect. Notes in Computer Science 447 (Springer-Verlag 1990), pp. 348–359. Full version is to appear in SIAM J. Comput. 22.

    Google Scholar 

  20. Ukkonen, E. (1985): Finding approximate patterns in strings. J. Algorithms 6, 132–137.

    Google Scholar 

  21. Ukkonen, E. (1992): Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science 92, 191–211.

    Google Scholar 

  22. Ukkonen, E. (1992): Constructing suffix trees on-line in linear time. In: J. van Leeuwen (ed.), Algorithms, Software, Architecture. Information Processing 92, vol. I, pp. 484–492. Elsevier.

    Google Scholar 

  23. Ukkonen, E. & Wood, D.: Approximate string matching with suffix automata. Algorithmica (to appear in 1993).

    Google Scholar 

  24. Wagner, R. A. & Fischer, M. J. (1974): The string-to-string correction problem. J. ACM 21, 168–173.

    Google Scholar 

  25. Weiner, P. (1973): Linear pattern matching algorithms. Proc. 14th IEEE Symp. Switching and Automata Theory, pp. 1–11.

    Google Scholar 

  26. Wu, S. & Manber, U. (1992): Fast text searching allowing errors. Comm. ACM 35, 83–91.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alberto Apostolico Maxime Crochemore Zvi Galil Udi Manber

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ukkonen, E. (1993). Approximate string-matching over suffix trees. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds) Combinatorial Pattern Matching. CPM 1993. Lecture Notes in Computer Science, vol 684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0029808

Download citation

  • DOI: https://doi.org/10.1007/BFb0029808

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-56764-6

  • Online ISBN: 978-3-540-47732-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics