Skip to main content

Pattern matching with mismatches: A probabilistic analysis and a randomized algorithm

Extended abstract

  • Conference paper
  • First Online:
Book cover Combinatorial Pattern Matching (CPM 1992)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 644))

Included in the following conference series:

Abstract

Given a text of length n and a pattern of length m over some (possibly unbounded) alphabet, we consider the problem of finding all positions in the text at which the pattern “almost occurs”. Here by “almost occurs” we mean that at least some fixed fraction ρ of the characters of the pattern (for example, ≥ 60% of them) are equal to their corresponding characters in the text. We design a randomized algorithm that has O(n log m) worst-case time complexity and computes with high probability all of the almost-occurrences of the pattern in the text. This algorithm assumes that the fraction ρ is given as part of its input, and it works well even for relatively small values of ρ. It makes no assumptions about the probabilistic characteristics of the input. Our second contribution deals with the issue of which values of ρ correspond to the intuitive notion of similarity between pattern and text, and this leads us to the development of a probabilistic analysis for the case where both input strings are random (in the usual, i.e., Bernoulli, model).

The first author's research was supported by the Office of Naval Research under Grants N0014-84-K-0502 and N0014-36-K-0689, and in part by AFOSR Grant 90-0107, and the NSF under Grant DCR-8451393, and in part by Grant R01 LM05118 from the National Library of Medicine. The second author was supported by NATO Collaborative Grant 0057/89. The third author's research was supported by AFOSR Grant 90-0107 and NATO Collaborative Grant 0057/89, and, in part by the NSF Grant CCR-8900305, and by Grant R01 LM05118 from the National Library of Medicine

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. K. Abrahamson, Generalized String Matching, SIAM J. Comput., 16, 1039–1051, 1987.

    Google Scholar 

  2. Abramowitz, M. and Stegun, I., Handbook of Mathematical Functions, Dover, New York (1964).

    Google Scholar 

  3. A.V. Aho, J.E. Hopcroft and J.D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, Mass., 1974.

    Google Scholar 

  4. Aldous, D., Probability Approximations via the Poisson Clumping Heuristic, Springer Verlag, New York 1989.

    Google Scholar 

  5. Arratia, R., Gordon, L., and Waterman, M., An Extreme Value Theory for Sequence Matching, Annals of Statistics, 14, 971–993, 1986.

    Google Scholar 

  6. Arratia, R., Gordon, L., and Waterman, M., The Erdös-Rényi Law in Distribution, for Coin Tossing and Sequence Matching, Annals of Statistics, 18, 539–570, 1990.

    Google Scholar 

  7. Chang, W.I. and Lawler, E.L., Approximate String Matching in Sublinear Expected Time, Proc. 31st Ann. IEEE Symp. on Foundations of Comp. Sci., 116–124, 1990.

    Google Scholar 

  8. Chung, K.L. and Erdös, P., On the Application of the Borel-Cantelli Lemma, Trans. of the American Math. Soc., 72, 179–186, 1952.

    Google Scholar 

  9. DeLisi, C., The Human Genome Project, American Scientist, 76, 488–493, 1988.

    Google Scholar 

  10. Feller, W., An Introduction to Probability Theory and its Applications, Vol. II, John Wiley & Sons, New York (1971).

    Google Scholar 

  11. Flajolet, P., Analysis of Algorithms, in Trends in Theoretical Computer Science (ed. E. Börger), Computer Science Press, 1988.

    Google Scholar 

  12. Galambos, J., The Asymptotic Theory of Extreme Order Statistics, John Wiley & Sons, New York (1978).

    Google Scholar 

  13. Galil, Z. and Park, K., An Improved Algorithm for Approximate String Matching, SIAM J. Comp., 19, 989–999, 1990.

    Google Scholar 

  14. L. Guibas and A. Odlyzko, Periods in Strings Journal of Combinatorial Theory, Series A, 30, 19–43 (1981).

    Google Scholar 

  15. L. Guibas and A. W. Odlyzko, String Overlaps, Pattern Matching, and Nontransitive Games, Journal of Combinatorial Theory, Series A, 30, 183–208 (1981).

    Google Scholar 

  16. Henrici, P., Applied and Computational Complex Analysis, vol. I., John Wiley & Sons, New York 1974.

    Google Scholar 

  17. Jacquet, P. and Szpankowski, W., Autocorrelation on Words and Its Applications. Analysis of Suffix Trees by String-Ruler Approach, INRIA Technical report No. 1106, October 1989; submitted to a journal.

    Google Scholar 

  18. Karlin, S. and Ost, F., Counts of Long Aligned Matches Among Random Letter Sequences, Adv. Appl. Probab., 19, 293–351, 1987.

    Google Scholar 

  19. Knuth, D.E., J. Morris and V. Pratt, Fast Pattern Matching in Strings, SIAM J. Computing, 6, 323–350, 1977.

    Google Scholar 

  20. Landau, G.M. and Vishkin, U., Efficient String Matching with k Mismatches, Theor. Comp. Sci., 43, 239–249, 1986.

    Google Scholar 

  21. Landau, G.M. and Vishkin, U., Fast String Matching with k Differences, J. Comp. Sys. Sci., 37, 63–78, 1988.

    Google Scholar 

  22. Landau, G.M. and Vishkin, U., Fast Parallel and Serial Approximate String Matching, J. Algorithms, 10, 157–169, 1989.

    Google Scholar 

  23. E.W. Myers, An O(ND) Difference Algorithm and Its Variations, Algorithmica, 1, 252–266, 1986.

    Google Scholar 

  24. Noble, B. and Daniel, J., Applied Linear Algebra, Prentice-Hall, New Jersey 1988

    Google Scholar 

  25. Seneta, E., Non-Negative Matrices and Markov Chains, Springer-Verlag, New York 1981.

    Google Scholar 

  26. Szpankowski, W., On the Height of Digital Trees and Related Problems, Algorithmica, 6, 256–277, 1991.

    Google Scholar 

  27. M. Zuker, Computer Prediction of RNA Structure, Methods in Enzymology, 180, 262–288, 1989.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alberto Apostolico Maxime Crochemore Zvi Galil Udi Manber

Rights and permissions

Reprints and permissions

Copyright information

© 1992 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Atallah, M.J., Jacquet, P., Szpankowski, W. (1992). Pattern matching with mismatches: A probabilistic analysis and a randomized algorithm. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds) Combinatorial Pattern Matching. CPM 1992. Lecture Notes in Computer Science, vol 644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56024-6_3

Download citation

  • DOI: https://doi.org/10.1007/3-540-56024-6_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-56024-1

  • Online ISBN: 978-3-540-47357-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics