Abstract
Two equal length strings s and s′, over alphabets Σs and Σs′, parameterize match if there exists a bijection π : Σs → Σs′ such that π (s) = s′, where π (s) is the renaming of each character of s via π. Parameterized matching is the problem of finding all parameterized matches of a pattern string p in a text t, and approximate parameterized matching is the problem of finding at each location a bijection π that maximizes the number of characters that are mapped from p to the appropriate |p|-length substring of t.
Parameterized matching was introduced as a model for software duplication detection in software maintenance systems and also has applications in image processing and computational biology. For example, approximate parameterized matching models image searching with variable color maps in the presence of errors.
We consider the problem for which an error threshold, k, is given, and the goal is to find all locations in t for which there exists a bijection π which maps p into the appropriate |p|-length substring of t with at most k mismatched mapped elements. Our main result is an algorithm for this problem with O(nk1.5 + mk log m) time complexity, where m = |p| and n=|t|. We also show that when |p| = |t| = m, the problem is equivalent to the maximum matching problem on graphs, yielding a O(m + k1.5) solution.
- Amir, A., Aumann, Y., Cole, R., Lewenstein, M., and Porat, E. 2003. Function matching: Algorithms, applications, and a lower bound. In Proceedings of the 30th International Colloquium on Automata, Languages and Programming (ICALP). 929--942. Google ScholarDigital Library
- Amir, A., Benson, G., and Farach, M. 1994. An alphabet independent approach to two-dimensional pattern matching. SIAM J. Comput. 23, 2, 313--323. Google ScholarDigital Library
- Amir, A., Church, K. W., and Dar, E. 2002. Separable attributes: a technique for solving the sub matrices character count problem. In Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms (SODA). 400--401. Google ScholarDigital Library
- Amir, A., Farach, M., and Muthukrishnan, S. 1994. Alphabet dependence in parameterized matching. Information Process. Lett. 49, 3, 111--115. Google ScholarDigital Library
- Apostolico, A., Erdős, P., and Lewenstein, M. 2007. Parameterized matching with mismatches. J. Discrete Algor. 5, 1, 135--140. Google ScholarDigital Library
- Babu, G. P., Mehtre, B. M., and Kankanhalli, M. S. 1995. Color indexing for efficient image retrieval. Multimed. Tools Applic. 1, 4, 327--348.Google ScholarCross Ref
- Baker, B. S. 1993. A theory of parameterized pattern matching: algorithms and applications. In Proceedings of the 25th Annual ACM Symposium on the Theory of Computation (STOC). 71--80. Google ScholarDigital Library
- Baker, B. S. 1996. Parameterized pattern matching: Algorithms and applications. J. Comput. Syst. Sci. 52, 1, 28--42. Google ScholarDigital Library
- Baker, B. S. 1997. Parameterized duplication in strings: Algorithms and an application to software maintenance. SIAM J. Comput. 26, 5, 1343--1362. Google ScholarDigital Library
- Baker, B. S. 1999. Parameterized diff. In Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 854--855. Google ScholarDigital Library
- Boyer, R. S., and Moore, J. S. 1977. A fast string searching algorithm. Comm. ACM 20, 10, 762--772. Google ScholarDigital Library
- Cole, R., and Hariharan, R. 2000. Faster suffix tree construction with missing suffix links. In Proceedings of the 32nd ACM Symposium on Theory of Computing (STOC). 407--415. Google ScholarDigital Library
- Fredman, M. L., and Tarjan, R. E. 1987. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34, 3, 596--615. Google ScholarDigital Library
- Gabow, H. N. 1985. Scaling algorithms for network problems. J. Comput. Syst. Sci. 31, 2, 148--168. Google ScholarDigital Library
- Gabow, H. N., and Tarjan, R. E. 1989. Faster scaling algorithms for network problems. SIAM J. Comput. 18, 5, 1013--1036. Google ScholarDigital Library
- Galil, Z., and Giancarlo, R. 1986. Improved string matching with k mismatches. SIGACT News 17, 4, 52--54. Google ScholarDigital Library
- Harel, D., and Tarjan, R. E. 1984. Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13, 2, 338--355. Google ScholarDigital Library
- Hazay, C., Lewenstein, M., and Sokol, D. 2004. Approximate parameterized matching. In Proceedings of the 12th Annual European Symposium on Algorithms (ESA), S. Albers and T. Radzik, Eds. Lecture Notes in Computer Science, vol. 3221. Springer, 414--425.Google Scholar
- Kao, M.-Y., Lam, T. W., Sung, W.-K., and Ting, H.-F. 2001. A decomposition theorem for maximum weight bipartite matchings. SIAM J. Comput. 31, 1, 18--26. Google ScholarDigital Library
- Knuth, D. E., Jr., J. H. M., and Pratt, V. R. 1977. Fast pattern matching in strings. SIAM J. Comput. 6, 2, 323--350.Google ScholarDigital Library
- Kosaraju, S. R. 1995. Faster algorithms for the construction of parameterized suffix trees (preliminary version). In Proceedings of the 36th IEEE Annual Symposium on Foundations of Computer Science (FOCS). 631--637. Google ScholarDigital Library
- Landau, G. M., and Vishkin, U. 1986. Efficient string matching with k mismatches. Theoret. Comput. Sci. 43, 239--249. Google ScholarDigital Library
- Landau, G. M., and Vishkin, U. 1988. Fast string matching with k differences. J. Comput. Syst. Sci. 37, 1, 63--78. Google ScholarDigital Library
- Levenshtein, V. I. 1966. Binary codes capable of correcting, deletions, insertions and reversals. Soviet Phys. Dokl. 10, 707--710.Google Scholar
- Schieber, B., and Vishkin, U. 1988. On finding lowest common ancestors: Simplification and parallelization. SIAM J. Comput. 17, 6, 1253--1262. Google ScholarDigital Library
- Swain, M. J., and Ballard, D. H. 1991. Color indexing. Int. J. Comput. Vision 7, 1, 11--32. Google ScholarDigital Library
Index Terms
- Approximate parameterized matching
Recommendations
Approximate Boyer-Moore String Matching for Small Alphabets
Recently a new variation of approximate Boyer-Moore string matching was presented for the k-mismatch problem. The variation, called FAAST, is specifically tuned for small alphabets. We further improve this algorithm gaining speedups in both ...
Approximate Pattern Matching with the L 1, L 2 and L ∞ Metrics
Given an alphabet Σ={1,2,…,|Σ|} text string T∈Σn and a pattern string P∈Σm , for each i=1,2,…,n−m+1 define L p (i) as the p-norm distance when the pattern is aligned below the text and starts at position i of the text. The problem of pattern matching ...
New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance
This paper proposes new algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance. Fixed-length approximate string matching and approximate circular string matching are special cases of ...
Comments