Skip to main content

Fast two-dimensional approximate pattern matching

  • Conference paper
  • First Online:
LATIN'98: Theoretical Informatics (LATIN 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1380))

Included in the following conference series:

Abstract

We address the problem of approximate string matching in two dimensions, that is, to find a pattern of size m×m in a text of size n×n with at most k errors (substitutions, insertions and deletions). Although the problem can be solved using dynamic programming in time O(m 2 n 2), this is in general too expensive for small k. So we design a filtering algorithm which avoids verifying most of the text with dynamic programming. This filter is based on a one-dimensional multi-pattern approximate search algorithm. The average complexity of our resulting algorithm is O(n 2 klogσ m /m 2) for k < m(m+l)/(5logσ m), which is optimal and matches the best previous result which allows only substitutions. For higher error levels, we present an algorithm with time complexity O(n 2 k/(w√σ)) (where w is the size in bits of the computer word and σ is the alphabet size). This algorithm works for k < m(m+1)(l−e/√σ), where e=2.718..., a limit which is not possible to improve. These are the first good expected-case algorithms for the problem. Our algorithms work also for rectangular patterns and rectangular text and can even be extended to the case where each row in the pattern and the text has a different length.

Support from Fondecyt grants 1-95-0622 and 1-96-0881 are gratefully acknowledged.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Amir and M. Farach. Efficient 2-dimensional approximate matching of nonrectangular figures. In Proc. SODA '91, pages 212–223, 1991.

    Google Scholar 

  2. A. Amir and G. Landau. Fast parallel and serial multidimensional approximate array matching. Theoretical Computer Science, 81:97–115, 1991.

    Article  MathSciNet  MATH  Google Scholar 

  3. R. Baeza-Yates. Similarity in two dimensional strings. Dept. of Computer Science, University of Chile, 1996.

    Google Scholar 

  4. R. Baeza-Yates and G. Navarro. A faster algorithm for approximate string matching. In Proc. CPM'96, LNCS 1075, pages 1–23, 1996. ftp://ftp.dcc.uchile.cl/-pub/users/gnavarro/cpm96.ps.gz.

    MathSciNet  Google Scholar 

  5. R. Baeza-Yates and G. Navarro. Multiple approximate string matching. In Proc. WADS'97, LNCS 1272, pages 174–184, 1997. ftp://ftp.dcc.uchile.cl/pub/-users/gnavarro/vads97.ps.gz.

    Google Scholar 

  6. R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, LNCS 644, pages 185–192, 1992.

    Google Scholar 

  7. R. Baeza-Yates and M. Régnier. Fast two dimensional pattern matching. Information Processing Letters, 45:51–57, 1993.

    Article  MATH  Google Scholar 

  8. T. Baker. A technique for extending rapid exact string matching to arrays of more than one dimension. SIAM Journal on Computing, 7:533–541, 1978.

    Article  MATH  MathSciNet  Google Scholar 

  9. R. Bird. Two dimensional pattern matching. Inf. Proc. Letters, 6:168–170, 1977.

    Article  Google Scholar 

  10. W. Chang and J. Lampe. Theoretical and empirical comparisons of approximate string matching algorithms. In Proc. CPM'92, LNCS 644, pages 172–181, 1992.

    MathSciNet  Google Scholar 

  11. M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, Oxford, UK, 1994.

    MATH  Google Scholar 

  12. J. Karkkäinen and E. Ukkonen. Two and higher dimensional pattern matching in optimal expected time. In Proc. SODA '94, pages 715–723. SIAM, 1994.

    Google Scholar 

  13. K. Krithivasan. Efficient two-dimensional parallel and serial approximate pattern matching. Technical Report CAR-TR-259, University of Maryland, 1987.

    Google Scholar 

  14. K. Krithivasan and R. Sitalakshmi. Efficient two-dimensional pattern matching in the presence of errors. Information Sciences, 43:169–184, 1987.

    Article  Google Scholar 

  15. G. Landau and U. Vishkin. Fast string matching with k differences. J. of Computer Systems Science, 37:63–78, 1988.

    Article  MathSciNet  MATH  Google Scholar 

  16. R. Muth and U. Manber. Approximate multiple string search. In Proc. CPM'96, LNCS 1075, pages 75–86, 1996.

    MathSciNet  Google Scholar 

  17. G. Navarro. Multiple approximate string matching by counting. In Proc. WSP'97, pages 125–139, 1997. ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/-wsp97.1.ps.gz.

    Google Scholar 

  18. K. Park. Analysis of two dimensional approximate pattern matching algorithms. In Proc. CPM'96, LNCS 1075, pages 335–347, 1996.

    Google Scholar 

  19. P. Sellers. The theory and computation of evolutionary distances: pattern recognition. J. of Algorithms, 1:359–373, 1980.

    Article  MATH  MathSciNet  Google Scholar 

  20. E. Sutinen and J. Tarhio. On using g-gram locations in approximate string matching. In Proc. ESA '95, LNCS 979, 1995.

    Google Scholar 

  21. Esko Ukkonen. Finding approximate patterns in strings. J. of Algorithms, 6:132–137, 1985.

    Article  MATH  MathSciNet  Google Scholar 

  22. S. Wu and U. Manber. Fast text searching allowing errors. CACM, 35(10):83–91, October 1992.

    Google Scholar 

  23. S. Wu, U. Manber, and E. Myers. A sub-quadratic algorithm for approximate limited expression matching. Algorithmica, 15(1):50–67, 1996.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Cláudio L. Lucchesi Arnaldo V. Moura

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Baeza-Yates, R., Navarro, G. (1998). Fast two-dimensional approximate pattern matching. In: Lucchesi, C.L., Moura, A.V. (eds) LATIN'98: Theoretical Informatics. LATIN 1998. Lecture Notes in Computer Science, vol 1380. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0054334

Download citation

  • DOI: https://doi.org/10.1007/BFb0054334

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64275-6

  • Online ISBN: 978-3-540-69715-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics