Abstract
Detecting and measuring repetitiveness of strings is a problem that has been extensively studied in data compression and text indexing. When the data are structured in a non-linear way, as in two-dimensional strings, inherent redundancy offers a rich source for compression, yet systematic studies on repetitiveness measures are still lacking. In this paper, we extend to two dimensions the measures \(\delta \) and \(\gamma \), defined in terms of the submatrices of the input, as well as the measures g, \(g_{rl}\), and b, which are based on copy-paste mechanisms. We study their properties and mutual relationships, and we show that the two classes of measures become incomparable when two-dimensional inputs are considered. We also compare our measures with the 2D Block Tree data structure [Brisaboa et al., Computer J., 2024], and provide some insights for the design of effective 2D compressors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Analogously to the 1D setting, the parse tree of 2D SLP is an ordered labeled tree where S is the root, and the children of a variable A are the variables in its right-hand side (possibly repeated).
References
Bannai, H., et al.: The smallest grammar problem revisited. IEEE Trans. Inf. Theory 67(1), 317–328 (2021)
Berman, P., Karpinski, M., Larmore, L.L., Plandowski, W., Rytter, W.: On the complexity of pattern matching for highly compressed two-dimensional texts. J. Comput. Syst. Sci. 65(2), 332–350 (2002)
Brisaboa, N.R., Gagie, T., Gómez-Brandón, A., Navarro, G.: Two-dimensional block trees. Comput. J. 67(1), 391–406 (2024)
Carfagna, L., Manzini, G.: Compressibility measures for two-dimensional data. In: Proceedings of the 30th International Symposium on String Processing and Information Retrieval, SPIRE 2023. LNCS, vol. 14240, pp. 102–113. Springer, Heidelberg (2023). https://doi.org/10.1007/978-3-031-43980-3_9
Carfagna, L., Manzini, G.: The landscape of compressibility measures for two-dimensional data. IEEE Access 12, 87268–87283 (2024)
Charikar, M., et al.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)
Christiansen, A.R., Ettienne, M.B., Kociumaka, T., Navarro, G., Prezza, N.: Optimal-time dictionary-compressed indexes. ACM Trans. Algor. 17(1), 8:1–8:39 (2021)
Gagie, T., Navarro, G., Prezza, N.: On the approximation ratio of lempel-ziv parsing. In: Bender, M.A., Farach-Colton, M., Mosteiro, M.A. (eds.) LATIN 2018. LNCS, vol. 10807, pp. 490–503. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77404-6_36
Gallant, J.K.: String Compression Algorithms. Ph.D. thesis, Princeton University (1982)
Ganardi, M., Jez, A., Lohrey, M.: Balancing straight-line programs. J. ACM 68(4), 27:1–27:40 (2021)
Giammarresi, D., Restivo, A.: Two-dimensional languages. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, pp. 215–267. Springer, Heidelberg (1997). https://doi.org/10.1007/978-3-642-59126-6_4
Giancarlo, R.: A generalization of the suffix tree to square matrices, with applications. SIAM J. Comput. 24(3), 520–562 (1995)
Kempa, D., Prezza, N.: At the roots of dictionary compression: string attractors. In: STOC, pp. 827–840. ACM (2018)
Kociumaka, T., Navarro, G., Prezza, N.: Toward a definitive compressibility measure for repetitive sequences. IEEE Trans. Inf. Theory 69(4), 2074–2092 (2023)
Lempel, A., Ziv, J.: Compression of two-dimensional data. IEEE Trans. Inf. Theory 32(1), 2–8 (1986)
Mantaci, S., Restivo, A., Romana, G., Rosone, G., Sciortino, M.: A combinatorial view on string attractors. Theor. Comput. Sci. 850, 236–248 (2021)
Navarro, G.: Indexing highly repetitive string collections, part II: compressed indexes. ACM Comput. Surv. 54(2), 26 (2021)
Navarro, G.: Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Comput. Surv. 54(2), 29 (2021)
Navarro, G., Ochoa, C., Prezza, N.: On the approximation ratio of ordered parsings. IEEE Trans. Inf. Theory 67(2), 1008–1026 (2021)
Acknowledgments
LC and GM are partially funded by the PNRR ECS00000017 Tuscany Health Ecosystem, Spoke 6, CUP I53C22000780001, funded by the NextGeneration EU programme, by the spoke “FutureHPC & BigData” of the ICSC—Centro Nazionale di Ricerca in High-Performance Computing, Big Data and Quantum Computing, funded by the NextGeneration EU programme.
GR and MS are partially funded by the MUR PRIN Project “PINC, Pangenome INformatiCs: from Theory to Applications” (Grant No. 2022YRB97K).
LC, GM, MS, and GR are partially funded by the INdAM-GNCS Project CUP E53C23001670001.
CU is partially funded by ANID-Subdirección de Capital Humano/Doctorado Nacional/2021-21210580, ANID, Chile, partially funded by Basal Funds FB0001, ANID, Chile, and partially funded by Fondecyt Grant 1-230755.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Carfagna, L., Manzini, G., Romana, G., Sciortino, M., Urbina, C. (2025). Generalization of Repetitiveness Measures for Two-Dimensional Strings. In: Lipták, Z., Moura, E., Figueroa, K., Baeza-Yates, R. (eds) String Processing and Information Retrieval. SPIRE 2024. Lecture Notes in Computer Science, vol 14899. Springer, Cham. https://doi.org/10.1007/978-3-031-72200-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-72200-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72199-1
Online ISBN: 978-3-031-72200-4
eBook Packages: Computer ScienceComputer Science (R0)