Skip to main content
Log in

On Approximate Jumbled Pattern Matching in Strings

  • Published:
Theory of Computing Systems Aims and scope Submit manuscript

Abstract

Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of a Parikh vector q in the text s requires finding a substring t of s with p(t)=q. This can be viewed as the task of finding a jumbled (permuted) version of a query pattern, hence the term Jumbled Pattern Matching. We present several algorithms for the approximate version of the problem: Given a string s and two Parikh vectors u,v (the query bounds), find all maximal occurrences in s of some Parikh vector q such that uqv. This definition encompasses several natural versions of approximate Parikh vector search. We present an algorithm solving this problem in sub-linear expected time using a wavelet tree of s, which can be computed in time O(n) in a preprocessing phase. We then discuss a Scrabble-like variation of the problem, in which a weight function on the letters of s is given and one has to find all occurrences in s of a substring t with maximum weight having Parikh vector p(t)≤v. For the case of a binary alphabet, we present an algorithm which solves the decision version of the Approximate Jumbled Pattern Matching problem in constant time, by indexing the string in subquadratic time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Amir, A., Apostolico, A., Landau, G.M., Satta, G.: Efficient text fingerprinting via Parikh mapping. J. Discrete Algorithms 1(5–6), 409–421 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  2. Babai, L., Felzenszwalb, P.F.: Computing rank-convolutions with a mask. ACM Trans. Algorithms 6(1), 1–13 (2009)

    Article  MathSciNet  Google Scholar 

  3. Benson, G.: Composition alignment. In: Proc. of the 3rd International Workshop on Algorithms in Bioinformatics (WABI’03), pp. 447–461 (2003)

    Chapter  Google Scholar 

  4. Böcker, S.: Simulating multiplexed SNP discovery rates using base-specific cleavage and mass spectrometry. Bioinformatics 23(2), 5–12 (2007)

    Article  Google Scholar 

  5. Böcker, S., Jahn, K., Mixtacki, J., Stoye, J.: Computation of median gene clusters. J. Comput. Biol. 16(8), 1085–1099 (2009)

    Article  MathSciNet  Google Scholar 

  6. Böcker, S., Lipták, Zs.: A fast and simple algorithm for the money changing problem. Algorithmica 48(4), 413–432 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bremner, D., Chan, T.M., Demaine, E.D., Erickson, J., Hurtado, F., Iacono, J., Langerman, S., Taslakian, P.: Necklaces, convolutions, and X+Y. In: 14th Annual European Symposium on Algorithms (ESA’06), pp. 160–171 (2006)

    Google Scholar 

  8. Burcsi, P., Cicalese, F., Fici, G., Lipták, Zs.: Algorithms for jumbled pattern matching in strings. In: 5th International Conference FUN with Algorithms (FUN), pp. 89–101 (2010)

    Chapter  Google Scholar 

  9. Burcsi, P., Cicalese, F., Fici, G., Lipták, Zs.: Algorithms for jumbled pattern matching in strings. Int. J. Found. Comput. Sci. (2011, to appear)

  10. Butman, A., Eres, R., Landau, G.M.: Scaled and permuted string matching. Inf. Process. Lett. 92(6), 293–297 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  11. Cicalese, F., Fici, G., Lipták, Zs.: Searching for jumbled patterns in strings. In: Proc. of the Prague Stringology Conference 2009 (PSC’09), pp. 105–117 (2009)

    Google Scholar 

  12. Cieliebak, M., Erlebach, T., Lipták, Zs., Stoye, J., Welzl, E.: Algorithmic complexity of protein identification: combinatorics of weighted strings. Discrete Appl. Math. 137(1), 27–46 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  13. Clark, D.: Compact pat trees. Ph.D. thesis, University of Waterloo, Canada (1996)

  14. Eres, R., Landau, G.M., Parida, L.: Permutation pattern discovery in biosequences. J. Comput. Biol. 11(6), 1050–1060 (2004)

    Article  Google Scholar 

  15. Goczyła, K.: The generalized Banach match-box problem: application in disc storage management. Acta Appl. Math. 5, 27–36 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  16. Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proc. 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’03), pp. 841–850 (2003)

    Google Scholar 

  17. Jokinen, P., Tarhio, J., Ukkonen, E.: A comparison of approximate string matching algorithms. Softw. Pract. Exp. 26(12), 1439–1458 (1996)

    Article  Google Scholar 

  18. Knuth, D.E., Morris, J.H., Jr., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  19. Mendelson, H., Pliskin, J., Yechiali, U.: Optimal storage allocation for serial files. Commun. ACM 22, 124–130 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  20. Mendelson, H., Pliskin, J., Yechiali, U.: A stochastic allocation problem. Oper. Res. 28, 687–693 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  21. Moosa, T.M., Rahman, M.S.: Indexing permutations for binary strings. Inf. Process. Lett. 110, 795–798 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  22. Moosa, T.M., Rahman, M.S.: Sub-quadratic time and linear size data structures for permutation matching in binary strings. J. Discrete Algorithms (2011, to appear)

  23. Munro, J.I.: Tables. In: Proc. of Foundations of Software Technology and Theoretical Computer Science (FSTTCS’96), pp. 37–42 (1996)

    Chapter  Google Scholar 

  24. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39(1) (2007)

  25. Parida, L.: Gapped permutation patterns for comparative genomics. In: Proc. of WABI 2006, pp. 376–387 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ferdinando Cicalese.

Additional information

Some of the results in this paper appeared in preliminary form in the proceedings of FUN 2010 [8].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Burcsi, P., Cicalese, F., Fici, G. et al. On Approximate Jumbled Pattern Matching in Strings. Theory Comput Syst 50, 35–51 (2012). https://doi.org/10.1007/s00224-011-9344-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00224-011-9344-5

Keywords

Navigation