skip to main content
article

Approximate parameterized matching

Published:01 August 2007Publication History
Skip Abstract Section

Abstract

Two equal length strings s and s′, over alphabets Σs and Σs′, parameterize match if there exists a bijection π : Σs → Σs′ such that π (s) = s′, where π (s) is the renaming of each character of s via π. Parameterized matching is the problem of finding all parameterized matches of a pattern string p in a text t, and approximate parameterized matching is the problem of finding at each location a bijection π that maximizes the number of characters that are mapped from p to the appropriate |p|-length substring of t.

Parameterized matching was introduced as a model for software duplication detection in software maintenance systems and also has applications in image processing and computational biology. For example, approximate parameterized matching models image searching with variable color maps in the presence of errors.

We consider the problem for which an error threshold, k, is given, and the goal is to find all locations in t for which there exists a bijection π which maps p into the appropriate |p|-length substring of t with at most k mismatched mapped elements. Our main result is an algorithm for this problem with O(nk1.5 + mk log m) time complexity, where m = |p| and n=|t|. We also show that when |p| = |t| = m, the problem is equivalent to the maximum matching problem on graphs, yielding a O(m + k1.5) solution.

References

  1. Amir, A., Aumann, Y., Cole, R., Lewenstein, M., and Porat, E. 2003. Function matching: Algorithms, applications, and a lower bound. In Proceedings of the 30th International Colloquium on Automata, Languages and Programming (ICALP). 929--942. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amir, A., Benson, G., and Farach, M. 1994. An alphabet independent approach to two-dimensional pattern matching. SIAM J. Comput. 23, 2, 313--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Amir, A., Church, K. W., and Dar, E. 2002. Separable attributes: a technique for solving the sub matrices character count problem. In Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms (SODA). 400--401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Amir, A., Farach, M., and Muthukrishnan, S. 1994. Alphabet dependence in parameterized matching. Information Process. Lett. 49, 3, 111--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Apostolico, A., Erdős, P., and Lewenstein, M. 2007. Parameterized matching with mismatches. J. Discrete Algor. 5, 1, 135--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Babu, G. P., Mehtre, B. M., and Kankanhalli, M. S. 1995. Color indexing for efficient image retrieval. Multimed. Tools Applic. 1, 4, 327--348.Google ScholarGoogle ScholarCross RefCross Ref
  7. Baker, B. S. 1993. A theory of parameterized pattern matching: algorithms and applications. In Proceedings of the 25th Annual ACM Symposium on the Theory of Computation (STOC). 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Baker, B. S. 1996. Parameterized pattern matching: Algorithms and applications. J. Comput. Syst. Sci. 52, 1, 28--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Baker, B. S. 1997. Parameterized duplication in strings: Algorithms and an application to software maintenance. SIAM J. Comput. 26, 5, 1343--1362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Baker, B. S. 1999. Parameterized diff. In Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 854--855. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Boyer, R. S., and Moore, J. S. 1977. A fast string searching algorithm. Comm. ACM 20, 10, 762--772. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cole, R., and Hariharan, R. 2000. Faster suffix tree construction with missing suffix links. In Proceedings of the 32nd ACM Symposium on Theory of Computing (STOC). 407--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Fredman, M. L., and Tarjan, R. E. 1987. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34, 3, 596--615. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gabow, H. N. 1985. Scaling algorithms for network problems. J. Comput. Syst. Sci. 31, 2, 148--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gabow, H. N., and Tarjan, R. E. 1989. Faster scaling algorithms for network problems. SIAM J. Comput. 18, 5, 1013--1036. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Galil, Z., and Giancarlo, R. 1986. Improved string matching with k mismatches. SIGACT News 17, 4, 52--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Harel, D., and Tarjan, R. E. 1984. Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13, 2, 338--355. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hazay, C., Lewenstein, M., and Sokol, D. 2004. Approximate parameterized matching. In Proceedings of the 12th Annual European Symposium on Algorithms (ESA), S. Albers and T. Radzik, Eds. Lecture Notes in Computer Science, vol. 3221. Springer, 414--425.Google ScholarGoogle Scholar
  19. Kao, M.-Y., Lam, T. W., Sung, W.-K., and Ting, H.-F. 2001. A decomposition theorem for maximum weight bipartite matchings. SIAM J. Comput. 31, 1, 18--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Knuth, D. E., Jr., J. H. M., and Pratt, V. R. 1977. Fast pattern matching in strings. SIAM J. Comput. 6, 2, 323--350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kosaraju, S. R. 1995. Faster algorithms for the construction of parameterized suffix trees (preliminary version). In Proceedings of the 36th IEEE Annual Symposium on Foundations of Computer Science (FOCS). 631--637. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Landau, G. M., and Vishkin, U. 1986. Efficient string matching with k mismatches. Theoret. Comput. Sci. 43, 239--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Landau, G. M., and Vishkin, U. 1988. Fast string matching with k differences. J. Comput. Syst. Sci. 37, 1, 63--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Levenshtein, V. I. 1966. Binary codes capable of correcting, deletions, insertions and reversals. Soviet Phys. Dokl. 10, 707--710.Google ScholarGoogle Scholar
  25. Schieber, B., and Vishkin, U. 1988. On finding lowest common ancestors: Simplification and parallelization. SIAM J. Comput. 17, 6, 1253--1262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Swain, M. J., and Ballard, D. H. 1991. Color indexing. Int. J. Comput. Vision 7, 1, 11--32. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Approximate parameterized matching

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Algorithms
        ACM Transactions on Algorithms  Volume 3, Issue 3
        August 2007
        216 pages
        ISSN:1549-6325
        EISSN:1549-6333
        DOI:10.1145/1273340
        Issue’s Table of Contents

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 August 2007
        Published in talg Volume 3, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader