Skip to main content
Log in

Approximate Pattern Matching with the L 1, L 2 and L Metrics

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Given an alphabet Σ={1,2,…,|Σ|} text string T∈Σn and a pattern string P∈Σm, for each i=1,2,…,nm+1 define L p (i) as the p-norm distance when the pattern is aligned below the text and starts at position i of the text. The problem of pattern matching with L p distance is to compute L p (i) for every i=1,2,…,nm+1. We discuss the problem for d=1,2,∞. First, in the case of L 1 matching (pattern matching with an L 1 distance) we show a reduction of the string matching with mismatches problem to the L 1 matching problem and we present an algorithm that approximates the L 1 matching up to a factor of 1+ε, which has an \(O(\frac{1}{\varepsilon^{2}}n\log m\log|\Sigma|)\) run time. Then, the L 2 matching problem (pattern matching with an L 2 distance) is solved with a simple O(nlog m) time algorithm. Finally, we provide an algorithm that approximates the L matching up to a factor of 1+ε with a run time of \(O(\frac{1}{\varepsilon}n\log m\log|\Sigma|)\) . We also generalize the problem of String Matching with mismatches to have weighted mismatches and present an O(nlog 4 m) algorithm that approximates the results of this problem up to a factor of O(log m) in the case that the weight function is a metric.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abrahamson, K.: Generalized string matching. SIAM J. Comput. 16(6), 1039–1051 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  2. Amir, A., Farach, M.: Efficient 2-dimensional approximate matching of half-rectangular figures. Inf. Comput. 118(1), 1–11 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  3. Amir, A., Farach, M., Muthukrishnan, S.: Alphabet dependence in parameterized matching. Inf. Process. Lett. 49, 111–115 (1994)

    Article  MATH  Google Scholar 

  4. Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with k mismatches. J. Algorithms 50(2), 257–275 (2004). Special SODA 2000 issue

    Article  MATH  MathSciNet  Google Scholar 

  5. Atallah, M.J., Chyzak, F., Dumas, P.: A randomized algorithm for approximate string matching. Algorithmica 29(3), 468–486 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  6. Baker, B.S.: A theory of parameterized pattern matching: algorithms and applications. In: Proc. 25th Annual ACM Symposium on the Theory of Computing, pp. 71–80 (1993)

  7. Chu, K.K.W., Wong, M.H.: Fast time-series searching with scaling and shifting. In: Proceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 31–June 2, 1999, Philadelphia, Pennsylvania, pp. 237–248. ACM, New York (1999)

    Chapter  Google Scholar 

  8. Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proc. of the 34th Annual ACM Symposium on Theory of Computing, pp. 592–601. ACM, New York (2002)

    Google Scholar 

  9. Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. MIT Press, Cambridge (1992)

    Google Scholar 

  10. Cormode, G., Muthukrishnan, S.: The string edit distance matching problem with moves. In: Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 667–676. ACM, New York (2002)

    Google Scholar 

  11. Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: Proceedings 1994 ACM SIGMOD Conference, Mineapolis, MN, pp. 419–429 (1994)

  12. Fischer, M.J., Paterson, M.S.: String matching and other products. In: Karp, R.M. (ed.) Complexity of Computation. SIAM-AMS Proceedings, vol.7, pp. 113–125 (1974)

  13. Hirschberg, D.S.: Serial computations of Levenshtein distances. In: Apostolico, A., Galil, Z. (eds.) Pattern Matching Algorithms, pp. 123–141. Oxford University Press, London (1997)

    Google Scholar 

  14. Indyk, P.: Faster algorithms for string matching problems: matching the convolution bound. In: Proceedings of the 39th IEEE Annual Symposium on Foundations of Computer Science, pp. 166–173. IEEE Comput. Soc., Los Alamitos (1998)

    Google Scholar 

  15. Indyk, P.: Stable distributions, pseudorandom generators, embeddings and data stream computation. In: Proc. 41st Symp. Foundations of Computer Science, pp. 189–197. IEEE Press, New York (2000)

    Chapter  Google Scholar 

  16. Indyk, P.: Algorithmic applications of low-distortion geometric embeddings. In: Proceedings of the 42th IEEE Annual Symposium on Foundations of Computer Science. IEEE Comput. Soc., Los Alamitos (2001)

    Google Scholar 

  17. Indyk, P., Koudas, N., Muthukrishnan, S.: Identifying representative trends in massive time series data sets using sketches. In: El Abbadi, A., Brodie, M.L., Chakravarthy, S., Dayal, U., Kamel, N., Schlageter, G., Whang, K.-Y. (eds.) VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10–14, 2000, Cairo, Egypt, pp. 363–372. Morgan Kaufmann, San Mateo (2000)

    Google Scholar 

  18. Indyk, P., Lewenstein, M., Lipsky, O., Porat, E.: Closest pair problems in very high dimensions. In: Proceedings of the 31st International Colloquium on Automata, Languages and Programming (ICALP). Lecture Notes in Computer Science, vol. 3142, pp. 782–792. Springer, Berlin (2004)

    Chapter  Google Scholar 

  19. Karloff, H.: Fast algorithms for approximately counting mismatches. Inf. Process. Lett. 48(2), 53–60 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  20. Knuth, D.E., Morris, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6, 323–350 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  21. Levenshtein, V.I.: Binary codes capable of correcting, deletions, insertions and reversals. Sov. Phys. Dokl. 10, 707–710 (1966)

    MathSciNet  Google Scholar 

  22. Lipsky, O., Porat, E.: L1 pattern matching lower bound. In: Proceedings of the 12th International Conference on String Processing and Information Retrieval (SPIRE), Bs. As., Argentina. Lecture Notes in Computer Science, vol. 3772, pp. 327–330. Springer, Berlin (2005)

    Google Scholar 

  23. Masek, W.J., Paterson, M.S.: A faster algorithm for computing string-edit distances. J. Comput. Syst. Sci. 20, 18–31 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  24. Muthukrishnan, S.: New results and open problems related to non-standard stringology. In: Proc. 6th Combinatorial Pattern Matching Conference. Lecture Notes in Computer Science, vol. 937, pp. 298–317. Springer, Berlin (1995)

    Google Scholar 

  25. Muthukrishnan, S., Ramesh, H.: String matching under a general matching relation. Inf. Comput. 122(1), 140–148 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  26. Perttu, S.: Combinatorial pattern matching in musical sequences. Master’s thesis, Department of Computer Science, University of Helsinki (2000)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ely Porat.

Additional information

Research supported in part by US-Israel Binational Science Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lipsky, O., Porat, E. Approximate Pattern Matching with the L 1, L 2 and L Metrics. Algorithmica 60, 335–348 (2011). https://doi.org/10.1007/s00453-009-9345-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-009-9345-9

Keywords

Navigation