Abstract
In this article we consider the following question: N words of length L are generated using a biased memoryless source, i.e. each letter is taken independently according to some fixed distribution on the alphabet, and collected in a set (duplicates are removed); what are the frequencies of the letters in a typical element of this random set? We prove that the typical frequency distribution of such a word can be characterized by considering the parameter \(\ell = L/\log N\). We exhibit two thresholds \(\ell _0<\ell _1\) that only depend on the source, such that if \(\ell \le \ell _0\), the distribution resembles the uniform distribution; if \(\ell \ge \ell _1\) it resembles the distribution of the source; and for \(\ell _0\le \ell \le \ell _1\) we characterize the distribution as an interpolation of the two extremal distributions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
It is a more natural view of the process to consider L as fixed and N as varying; this, however, leads to the somewhat artificial parameterization \(N=\exp (L/\ell )\).
- 2.
By “roughly” we mean up to some multiplicative power of L, with \(L=\varTheta (\log N)\) at our scale.
References
Du Boisberranger, J., Gardy, D., Ponty, Y.: The weighted words collector. In: AOFA - 23rd International Meeting on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms - 2012, pp. 243–264. DMTCS (2012)
Dubhashi, D., Ranjan, D.: Balls and bins: a study in negative dependence. Random Struct. Algorithms 13(2), 99–124 (1998)
Duchon, P., Nicaud, C., Pivoteau, C.: Gapped pattern statistics. In: Kärkkäinen, J., Radoszewski, J., Rytter, W. (eds.) 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017, 4–6 July 2017, Warsaw, Poland. LIPIcs, vol. 78, pp. 21:1–21:12. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017)
Gheorghiciuc, I., Ward, M.D.: On correlation polynomials and subword complexity. In: Discrete Mathematics and Theoretical Computer Science, DMTCS Proceedings, vol. AH, 2007 Conference on Analysis of Algorithms (AofA 07), January 2007
MacKay, D.J.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge (2003)
Rubinchik, M., Shur, A.M.: The number of distinct subpalindromes in random words. Fundam. Inform. 145(3), 371–384 (2016)
Van Der Vaart, A.W., Wellner, J.A.: Weak convergence. In: Van Der Vaart, A.W., Wellner, J.A. (eds.) Weak Convergence and Empirical Processes, pp. 16–28. Springer, New York (1996). https://doi.org/10.1007/978-1-4757-2545-2_3
Acknowledgments
The authors are grateful to Arnaud Carayol for his precious help when preparing this article, and an anonymous reviewer for suggesting the promising alternative \(\alpha \)-parametrization of the problem.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Duchon, P., Nicaud, C. (2018). On the Biased Partial Word Collector Problem. In: Bender, M., Farach-Colton, M., Mosteiro, M. (eds) LATIN 2018: Theoretical Informatics. LATIN 2018. Lecture Notes in Computer Science(), vol 10807. Springer, Cham. https://doi.org/10.1007/978-3-319-77404-6_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-77404-6_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77403-9
Online ISBN: 978-3-319-77404-6
eBook Packages: Computer ScienceComputer Science (R0)