Skip to main content

Substring Complexities on Run-Length Compressed Strings

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2022)

Abstract

Let \( S _{T}(k)\) denote the set of distinct substrings of length k in a string T, then its cardinality \(| S _{T}(k)|\) is called the k-th substring complexity of T. Recently, \(\delta = \max \{ | S _{T}(k)| / k : k \ge 1 \}\) has been shown to be a good compressibility measure of highly-repetitive strings. In this paper, given T of length n in the run-length compressed form of size \(\rho \), we show that \(\delta \) can be computed in \( C _{\textsf{sort}}(\rho , n)\) time and \(O(\rho )\) space, where \( C _{\textsf{sort}}(\rho , n) = O(\min (\rho \lg \lg \rho , \rho \lg _{\rho } n))\) is the time complexity for sorting \(\rho \) integers with \(O(\lg n)\) bits each in \(O(\rho )\) space in the Word-RAM model with word size \(\varOmega (\lg n)\).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Akagi, T., Funakoshi, M., Inenaga, S. Sensitivity of string compressors and repetitiveness measures (2021). arXiv:2107.08615

  2. Andersson, A., Hagerup, T., Nilsson, S., Raman, R.: Sorting in linear time? J. Comput. Syst. Sci. 57(1), 74–93 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bernardini, G., Fici, G., Gawrychowski, P., Pissis, S.P.: Substring complexity in sublinear space (2020). arXiv:2007.08357

  4. Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Technical report, HP Labs (1994)

    Google Scholar 

  5. Christiansen, A. R., Ettienne, M. B., Kociumaka, T., Navarro, G., Prezza, N.: Optimal-time dictionary-compressed indexes. ACM Trans. Algorithms, 17(1):8:1–8:39 (2021). https://doi.org/10.1145/3426473

  6. Farach, M.: Optimal suffix tree construction with large alphabets. In: 38th Annual Symposium on Foundations of Computer Science, FOCS 1997, Miami Beach, Florida, USA, 19–22 October 1997, pp. 137–143 (1997)

    Google Scholar 

  7. Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, New York (1997)

    Book  MATH  Google Scholar 

  8. Han, Y.: Deterministic sorting in o(nloglogn) time and linear space. J. Algorithms 50(1), 96–105 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  9. Han, Y., Thorup, M.: Integer sorting in 0(n sqrt (log log n)) expected time and linear space. In Proceedings of 43rd Symposium on Foundations of Computer Science (FOCS) 2002. IEEE Computer Society, pp. 135–144 (2002)

    Google Scholar 

  10. Kempa, D., Prezza, N.: At the roots of dictionary compression: string attractors. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (STOC) 2018, pp. 827–840 (2018)

    Google Scholar 

  11. Kida, T., Matsumoto, T., Shibata, Y., Takeda, M., Shinohara, A., Arikawa, S.: Collage system: a unifying framework for compressed pattern matching. Theor. Comput. Sci. 1(298), 253–272 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  12. Kieffer, J.C., Yang, E.-H.: Grammar-based codes: a new class of universal lossless source codes. IEEE Trans. Inf. Theory 46(3), 737–754 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  13. Kociumaka, T., Navarro, G., Prezza, N.: Towards a definitive measure of repetitiveness. In: Kohayakawa, Y., Miyazawa, F.K. (eds.) LATIN 2021. LNCS, vol. 12118, pp. 207–219. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61792-9_17

    Chapter  Google Scholar 

  14. Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Inf. Theory 22(1), 75–81 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  15. Mantaci, S., Restivo, A., Romana, G., Rosone, G., Sciortino, M.: A combinatorial view on string attractors. Theor. Comput. Sci. 850, 236–248 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  16. Navarro, G.: Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Comput. Surv. 54(2):29:1–29:31 (2021)

    Google Scholar 

  17. Navarro, G.: Indexing highly repetitive string collections, part II: compressed indexes. ACM Comput. Surv. 54(2):26:1–26:32 (2021)

    Google Scholar 

  18. Navarro, G., Prezza, N.: Universal compressed text indexing. Theor. Comput. Sci. 762, 41–50 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  19. Prezza, N.: Optimal rank and select queries on dictionary-compressed text. In: Pisanti, N., Pissis, S.P., (eds.) Proceedings of 30th Annual Symposium on Combinatorial Pattern Matching (CPM) 2019, vol. 128 of LIPIcs, pp. 4:1–4:12. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

    Google Scholar 

  20. Raskhodnikova, S., Ron, D., Rubinfeld, R., Smith, A.D.: Sublinear algorithms for approximating string compressibility. Algorithmica 65(3), 685–709 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  21. Storer, J.A., Szymanski, T.G.: Data compression via textural substitution. J. ACM 29(4), 928–951 (1982)

    Article  MATH  Google Scholar 

  22. Weiner, P.: Linear pattern-matching algorithms. In: Proceedings of 14th IEEE Annual Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

Download references

Acknowledgements

This work was supported by JSPS KAKENHI (Grant Number 22K11907).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akiyoshi Kawamoto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kawamoto, A., I, T. (2022). Substring Complexities on Run-Length Compressed Strings. In: Arroyuelo, D., Poblete, B. (eds) String Processing and Information Retrieval. SPIRE 2022. Lecture Notes in Computer Science, vol 13617. Springer, Cham. https://doi.org/10.1007/978-3-031-20643-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20643-6_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20642-9

  • Online ISBN: 978-3-031-20643-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics