Abstract
Let \( S _{T}(k)\) denote the set of distinct substrings of length k in a string T, then its cardinality \(| S _{T}(k)|\) is called the k-th substring complexity of T. Recently, \(\delta = \max \{ | S _{T}(k)| / k : k \ge 1 \}\) has been shown to be a good compressibility measure of highly-repetitive strings. In this paper, given T of length n in the run-length compressed form of size \(\rho \), we show that \(\delta \) can be computed in \( C _{\textsf{sort}}(\rho , n)\) time and \(O(\rho )\) space, where \( C _{\textsf{sort}}(\rho , n) = O(\min (\rho \lg \lg \rho , \rho \lg _{\rho } n))\) is the time complexity for sorting \(\rho \) integers with \(O(\lg n)\) bits each in \(O(\rho )\) space in the Word-RAM model with word size \(\varOmega (\lg n)\).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Akagi, T., Funakoshi, M., Inenaga, S. Sensitivity of string compressors and repetitiveness measures (2021). arXiv:2107.08615
Andersson, A., Hagerup, T., Nilsson, S., Raman, R.: Sorting in linear time? J. Comput. Syst. Sci. 57(1), 74–93 (1998)
Bernardini, G., Fici, G., Gawrychowski, P., Pissis, S.P.: Substring complexity in sublinear space (2020). arXiv:2007.08357
Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Technical report, HP Labs (1994)
Christiansen, A. R., Ettienne, M. B., Kociumaka, T., Navarro, G., Prezza, N.: Optimal-time dictionary-compressed indexes. ACM Trans. Algorithms, 17(1):8:1–8:39 (2021). https://doi.org/10.1145/3426473
Farach, M.: Optimal suffix tree construction with large alphabets. In: 38th Annual Symposium on Foundations of Computer Science, FOCS 1997, Miami Beach, Florida, USA, 19–22 October 1997, pp. 137–143 (1997)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, New York (1997)
Han, Y.: Deterministic sorting in o(nloglogn) time and linear space. J. Algorithms 50(1), 96–105 (2004)
Han, Y., Thorup, M.: Integer sorting in 0(n sqrt (log log n)) expected time and linear space. In Proceedings of 43rd Symposium on Foundations of Computer Science (FOCS) 2002. IEEE Computer Society, pp. 135–144 (2002)
Kempa, D., Prezza, N.: At the roots of dictionary compression: string attractors. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (STOC) 2018, pp. 827–840 (2018)
Kida, T., Matsumoto, T., Shibata, Y., Takeda, M., Shinohara, A., Arikawa, S.: Collage system: a unifying framework for compressed pattern matching. Theor. Comput. Sci. 1(298), 253–272 (2003)
Kieffer, J.C., Yang, E.-H.: Grammar-based codes: a new class of universal lossless source codes. IEEE Trans. Inf. Theory 46(3), 737–754 (2000)
Kociumaka, T., Navarro, G., Prezza, N.: Towards a definitive measure of repetitiveness. In: Kohayakawa, Y., Miyazawa, F.K. (eds.) LATIN 2021. LNCS, vol. 12118, pp. 207–219. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61792-9_17
Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Inf. Theory 22(1), 75–81 (1976)
Mantaci, S., Restivo, A., Romana, G., Rosone, G., Sciortino, M.: A combinatorial view on string attractors. Theor. Comput. Sci. 850, 236–248 (2021)
Navarro, G.: Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Comput. Surv. 54(2):29:1–29:31 (2021)
Navarro, G.: Indexing highly repetitive string collections, part II: compressed indexes. ACM Comput. Surv. 54(2):26:1–26:32 (2021)
Navarro, G., Prezza, N.: Universal compressed text indexing. Theor. Comput. Sci. 762, 41–50 (2019)
Prezza, N.: Optimal rank and select queries on dictionary-compressed text. In: Pisanti, N., Pissis, S.P., (eds.) Proceedings of 30th Annual Symposium on Combinatorial Pattern Matching (CPM) 2019, vol. 128 of LIPIcs, pp. 4:1–4:12. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)
Raskhodnikova, S., Ron, D., Rubinfeld, R., Smith, A.D.: Sublinear algorithms for approximating string compressibility. Algorithmica 65(3), 685–709 (2013)
Storer, J.A., Szymanski, T.G.: Data compression via textural substitution. J. ACM 29(4), 928–951 (1982)
Weiner, P.: Linear pattern-matching algorithms. In: Proceedings of 14th IEEE Annual Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Acknowledgements
This work was supported by JSPS KAKENHI (Grant Number 22K11907).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kawamoto, A., I, T. (2022). Substring Complexities on Run-Length Compressed Strings. In: Arroyuelo, D., Poblete, B. (eds) String Processing and Information Retrieval. SPIRE 2022. Lecture Notes in Computer Science, vol 13617. Springer, Cham. https://doi.org/10.1007/978-3-031-20643-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-20643-6_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20642-9
Online ISBN: 978-3-031-20643-6
eBook Packages: Computer ScienceComputer Science (R0)