On Approximate Jumbled Pattern Matching in Strings

Burcsi, Péter; Cicalese, Ferdinando; Fici, Gabriele; Lipták, Zsuzsanna

doi:10.1007/s00224-011-9344-5

On Approximate Jumbled Pattern Matching in Strings

Published: 11 June 2011

Volume 50, pages 35–51, (2012)
Cite this article

Theory of Computing Systems Aims and scope Submit manuscript

Péter Burcsi¹,
Ferdinando Cicalese²,
Gabriele Fici³ &
…
Zsuzsanna Lipták⁴

216 Accesses
Explore all metrics

Abstract

Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of a Parikh vector q in the text s requires finding a substring t of s with p(t)=q. This can be viewed as the task of finding a jumbled (permuted) version of a query pattern, hence the term Jumbled Pattern Matching. We present several algorithms for the approximate version of the problem: Given a string s and two Parikh vectors u,v (the query bounds), find all maximal occurrences in s of some Parikh vector q such that u≤q≤v. This definition encompasses several natural versions of approximate Parikh vector search. We present an algorithm solving this problem in sub-linear expected time using a wavelet tree of s, which can be computed in time O(n) in a preprocessing phase. We then discuss a Scrabble-like variation of the problem, in which a weight function on the letters of s is given and one has to find all occurrences in s of a substring t with maximum weight having Parikh vector p(t)≤v. For the case of a binary alphabet, we present an algorithm which solves the decision version of the Approximate Jumbled Pattern Matching problem in constant time, by indexing the string in subquadratic time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Amir, A., Apostolico, A., Landau, G.M., Satta, G.: Efficient text fingerprinting via Parikh mapping. J. Discrete Algorithms 1(5–6), 409–421 (2003)
Article MathSciNet MATH Google Scholar
Babai, L., Felzenszwalb, P.F.: Computing rank-convolutions with a mask. ACM Trans. Algorithms 6(1), 1–13 (2009)
Article MathSciNet Google Scholar
Benson, G.: Composition alignment. In: Proc. of the 3rd International Workshop on Algorithms in Bioinformatics (WABI’03), pp. 447–461 (2003)
Chapter Google Scholar
Böcker, S.: Simulating multiplexed SNP discovery rates using base-specific cleavage and mass spectrometry. Bioinformatics 23(2), 5–12 (2007)
Article Google Scholar
Böcker, S., Jahn, K., Mixtacki, J., Stoye, J.: Computation of median gene clusters. J. Comput. Biol. 16(8), 1085–1099 (2009)
Article MathSciNet Google Scholar
Böcker, S., Lipták, Zs.: A fast and simple algorithm for the money changing problem. Algorithmica 48(4), 413–432 (2007)
Article MathSciNet MATH Google Scholar
Bremner, D., Chan, T.M., Demaine, E.D., Erickson, J., Hurtado, F., Iacono, J., Langerman, S., Taslakian, P.: Necklaces, convolutions, and X+Y. In: 14th Annual European Symposium on Algorithms (ESA’06), pp. 160–171 (2006)
Google Scholar
Burcsi, P., Cicalese, F., Fici, G., Lipták, Zs.: Algorithms for jumbled pattern matching in strings. In: 5th International Conference FUN with Algorithms (FUN), pp. 89–101 (2010)
Chapter Google Scholar
Burcsi, P., Cicalese, F., Fici, G., Lipták, Zs.: Algorithms for jumbled pattern matching in strings. Int. J. Found. Comput. Sci. (2011, to appear)
Butman, A., Eres, R., Landau, G.M.: Scaled and permuted string matching. Inf. Process. Lett. 92(6), 293–297 (2004)
Article MathSciNet MATH Google Scholar
Cicalese, F., Fici, G., Lipták, Zs.: Searching for jumbled patterns in strings. In: Proc. of the Prague Stringology Conference 2009 (PSC’09), pp. 105–117 (2009)
Google Scholar
Cieliebak, M., Erlebach, T., Lipták, Zs., Stoye, J., Welzl, E.: Algorithmic complexity of protein identification: combinatorics of weighted strings. Discrete Appl. Math. 137(1), 27–46 (2004)
Article MathSciNet MATH Google Scholar
Clark, D.: Compact pat trees. Ph.D. thesis, University of Waterloo, Canada (1996)
Eres, R., Landau, G.M., Parida, L.: Permutation pattern discovery in biosequences. J. Comput. Biol. 11(6), 1050–1060 (2004)
Article Google Scholar
Goczyła, K.: The generalized Banach match-box problem: application in disc storage management. Acta Appl. Math. 5, 27–36 (1986)
Article MathSciNet MATH Google Scholar
Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proc. 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’03), pp. 841–850 (2003)
Google Scholar
Jokinen, P., Tarhio, J., Ukkonen, E.: A comparison of approximate string matching algorithms. Softw. Pract. Exp. 26(12), 1439–1458 (1996)
Article Google Scholar
Knuth, D.E., Morris, J.H., Jr., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)
Article MathSciNet MATH Google Scholar
Mendelson, H., Pliskin, J., Yechiali, U.: Optimal storage allocation for serial files. Commun. ACM 22, 124–130 (1979)
Article MathSciNet MATH Google Scholar
Mendelson, H., Pliskin, J., Yechiali, U.: A stochastic allocation problem. Oper. Res. 28, 687–693 (1980)
Article MathSciNet MATH Google Scholar
Moosa, T.M., Rahman, M.S.: Indexing permutations for binary strings. Inf. Process. Lett. 110, 795–798 (2010)
Article MathSciNet MATH Google Scholar
Moosa, T.M., Rahman, M.S.: Sub-quadratic time and linear size data structures for permutation matching in binary strings. J. Discrete Algorithms (2011, to appear)
Munro, J.I.: Tables. In: Proc. of Foundations of Software Technology and Theoretical Computer Science (FSTTCS’96), pp. 37–42 (1996)
Chapter Google Scholar
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39(1) (2007)
Parida, L.: Gapped permutation patterns for comparative genomics. In: Proc. of WABI 2006, pp. 376–387 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Algebra, Eötvös Loránd University, Budapest, Hungary
Péter Burcsi
Dipartimento di Informatica ed Applicazioni, University of Salerno, Salerno, Italy
Ferdinando Cicalese
I3S, UMR6070, CNRS et Université de Nice-Sophia, Antipolis, France
Gabriele Fici
AG Genominformatik, Technische Fakultät, Bielefeld University, Bielefeld, Germany
Zsuzsanna Lipták

Authors

Péter Burcsi
View author publications
You can also search for this author in PubMed Google Scholar
Ferdinando Cicalese
View author publications
You can also search for this author in PubMed Google Scholar
Gabriele Fici
View author publications
You can also search for this author in PubMed Google Scholar
Zsuzsanna Lipták
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ferdinando Cicalese.

Additional information

Some of the results in this paper appeared in preliminary form in the proceedings of FUN 2010 [8].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Burcsi, P., Cicalese, F., Fici, G. et al. On Approximate Jumbled Pattern Matching in Strings. Theory Comput Syst 50, 35–51 (2012). https://doi.org/10.1007/s00224-011-9344-5

Download citation

Published: 11 June 2011
Issue Date: January 2012
DOI: https://doi.org/10.1007/s00224-011-9344-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Approximate Jumbled Pattern Matching in Strings

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Indexes for Jumbled Pattern Matching with Constant-Sized Alphabet

Fast Algorithms for the Shortest Unique Palindromic Substring Problem on Run-Length Encoded Strings

Fast and Simple Computations Using Prefix Tables Under Hamming and Edit Distance

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

On Approximate Jumbled Pattern Matching in Strings

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Indexes for Jumbled Pattern Matching with Constant-Sized Alphabet

Fast Algorithms for the Shortest Unique Palindromic Substring Problem on Run-Length Encoded Strings

Fast and Simple Computations Using Prefix Tables Under Hamming and Edit Distance

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation