Skip to main content

Generalization of Repetitiveness Measures for Two-Dimensional Strings

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2024)

Abstract

Detecting and measuring repetitiveness of strings is a problem that has been extensively studied in data compression and text indexing. When the data are structured in a non-linear way, as in two-dimensional strings, inherent redundancy offers a rich source for compression, yet systematic studies on repetitiveness measures are still lacking. In this paper, we extend to two dimensions the measures \(\delta \) and \(\gamma \), defined in terms of the submatrices of the input, as well as the measures g, \(g_{rl}\), and b, which are based on copy-paste mechanisms. We study their properties and mutual relationships, and we show that the two classes of measures become incomparable when two-dimensional inputs are considered. We also compare our measures with the 2D Block Tree data structure [Brisaboa et al., Computer J., 2024], and provide some insights for the design of effective 2D compressors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Analogously to the 1D setting, the parse tree of 2D SLP is an ordered labeled tree where S is the root, and the children of a variable A are the variables in its right-hand side (possibly repeated).

References

  1. Bannai, H., et al.: The smallest grammar problem revisited. IEEE Trans. Inf. Theory 67(1), 317–328 (2021)

    Article  MathSciNet  Google Scholar 

  2. Berman, P., Karpinski, M., Larmore, L.L., Plandowski, W., Rytter, W.: On the complexity of pattern matching for highly compressed two-dimensional texts. J. Comput. Syst. Sci. 65(2), 332–350 (2002)

    Article  MathSciNet  Google Scholar 

  3. Brisaboa, N.R., Gagie, T., Gómez-Brandón, A., Navarro, G.: Two-dimensional block trees. Comput. J. 67(1), 391–406 (2024)

    Article  MathSciNet  Google Scholar 

  4. Carfagna, L., Manzini, G.: Compressibility measures for two-dimensional data. In: Proceedings of the 30th International Symposium on String Processing and Information Retrieval, SPIRE 2023. LNCS, vol. 14240, pp. 102–113. Springer, Heidelberg (2023). https://doi.org/10.1007/978-3-031-43980-3_9

  5. Carfagna, L., Manzini, G.: The landscape of compressibility measures for two-dimensional data. IEEE Access 12, 87268–87283 (2024)

    Article  Google Scholar 

  6. Charikar, M., et al.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)

    Article  MathSciNet  Google Scholar 

  7. Christiansen, A.R., Ettienne, M.B., Kociumaka, T., Navarro, G., Prezza, N.: Optimal-time dictionary-compressed indexes. ACM Trans. Algor. 17(1), 8:1–8:39 (2021)

    Google Scholar 

  8. Gagie, T., Navarro, G., Prezza, N.: On the approximation ratio of lempel-ziv parsing. In: Bender, M.A., Farach-Colton, M., Mosteiro, M.A. (eds.) LATIN 2018. LNCS, vol. 10807, pp. 490–503. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77404-6_36

    Chapter  Google Scholar 

  9. Gallant, J.K.: String Compression Algorithms. Ph.D. thesis, Princeton University (1982)

    Google Scholar 

  10. Ganardi, M., Jez, A., Lohrey, M.: Balancing straight-line programs. J. ACM 68(4), 27:1–27:40 (2021)

    Google Scholar 

  11. Giammarresi, D., Restivo, A.: Two-dimensional languages. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, pp. 215–267. Springer, Heidelberg (1997). https://doi.org/10.1007/978-3-642-59126-6_4

    Chapter  Google Scholar 

  12. Giancarlo, R.: A generalization of the suffix tree to square matrices, with applications. SIAM J. Comput. 24(3), 520–562 (1995)

    Article  MathSciNet  Google Scholar 

  13. Kempa, D., Prezza, N.: At the roots of dictionary compression: string attractors. In: STOC, pp. 827–840. ACM (2018)

    Google Scholar 

  14. Kociumaka, T., Navarro, G., Prezza, N.: Toward a definitive compressibility measure for repetitive sequences. IEEE Trans. Inf. Theory 69(4), 2074–2092 (2023)

    Article  MathSciNet  Google Scholar 

  15. Lempel, A., Ziv, J.: Compression of two-dimensional data. IEEE Trans. Inf. Theory 32(1), 2–8 (1986)

    Article  Google Scholar 

  16. Mantaci, S., Restivo, A., Romana, G., Rosone, G., Sciortino, M.: A combinatorial view on string attractors. Theor. Comput. Sci. 850, 236–248 (2021)

    Article  MathSciNet  Google Scholar 

  17. Navarro, G.: Indexing highly repetitive string collections, part II: compressed indexes. ACM Comput. Surv. 54(2), 26 (2021)

    Google Scholar 

  18. Navarro, G.: Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Comput. Surv. 54(2), 29 (2021)

    Google Scholar 

  19. Navarro, G., Ochoa, C., Prezza, N.: On the approximation ratio of ordered parsings. IEEE Trans. Inf. Theory 67(2), 1008–1026 (2021)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

LC and GM are partially funded by the PNRR ECS00000017 Tuscany Health Ecosystem, Spoke 6, CUP I53C22000780001, funded by the NextGeneration EU programme, by the spoke “FutureHPC & BigData” of the ICSC—Centro Nazionale di Ricerca in High-Performance Computing, Big Data and Quantum Computing, funded by the NextGeneration EU programme.

GR and MS are partially funded by the MUR PRIN Project “PINC, Pangenome INformatiCs: from Theory to Applications” (Grant No. 2022YRB97K).

LC, GM, MS, and GR are partially funded by the INdAM-GNCS Project CUP E53C23001670001.

CU is partially funded by ANID-Subdirección de Capital Humano/Doctorado Nacional/2021-21210580, ANID, Chile, partially funded by Basal Funds FB0001, ANID, Chile, and partially funded by Fondecyt Grant 1-230755.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giuseppe Romana .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Carfagna, L., Manzini, G., Romana, G., Sciortino, M., Urbina, C. (2025). Generalization of Repetitiveness Measures for Two-Dimensional Strings. In: Lipták, Z., Moura, E., Figueroa, K., Baeza-Yates, R. (eds) String Processing and Information Retrieval. SPIRE 2024. Lecture Notes in Computer Science, vol 14899. Springer, Cham. https://doi.org/10.1007/978-3-031-72200-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72200-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72199-1

  • Online ISBN: 978-3-031-72200-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics