Abstract
The notion of string attractor has been introduced by Kempa and Prezza (STOC 2018) in the context of Data Compression and it represents a set of positions of a finite word in which all of its factors can be “attracted”. The smallest size \(\gamma ^*\) of a string attractor for a finite word is a lower bound for several repetitiveness measures associated with the most common compression schemes, including BWT-based and LZ-based compressors. The combinatorial properties of the measure \(\gamma ^*\) have been studied in [Mantaci et al., TCS 2021]. Very recently, a complexity measure, called string attractor profile function, has been introduced for infinite words, by evaluating \(\gamma ^*\) on each prefix. Such a measure has been studied for automatic sequences and linearly recurrent infinite words in [Schaeffer and Shallit, arXiv 2021]. In this paper, we study the relationship between such a complexity measure and other well-known combinatorial notions related to repetitiveness in the context of infinite words, such as the factor complexity and the recurrence. Furthermore, we introduce new string attractor-based complexity measures, in which the structure and the distribution of positions in a string attractor of the prefixes of infinite words are considered. We show that such measures provide a finer classification of some infinite families of words.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Allouche, J.P., Shallit, J.: Automatic Sequences: Theory, Applications. Cambridge University Press, Generalizations (2003)
Béal, M., Perrin, D., Restivo, A.: Decidable problems in substitution shifts. CoRR abs/2112.14499 (2021)
Cassaigne, J.: Sequences with grouped factors. In: Developments in Language Theory, pp. 211–222. Aristotle University of Thessaloniki (1997)
Cassaigne, J., Karhumäki, J.: Toeplitz words, generalized periodicity and periodically iterated morphisms. Eur. J. Comb. 18(5), 497–510 (1997)
Castiglione, G., Restivo, A., Sciortino, M.: Circular Sturmian words and Hopcroft’s algorithm. Theor. Comput. Sci. 410(43), 4372–4381 (2009)
Castiglione, G., Restivo, A., Sciortino, M.: On extremal cases of Hopcroft’s algorithm. Theor. Comput. Sci. 411(38–39), 3414–3422 (2010)
Castiglione, G., Restivo, A., Sciortino, M.: Hopcroft’s algorithm and cyclic automata. In: Martín-Vide, C., Otto, F., Fernau, H. (eds.) LATA 2008. LNCS, vol. 5196, pp. 172–183. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88282-4_17
Christiansen, A.R., Ettienne, M.B., Kociumaka, T., Navarro, G., Prezza, N.: Optimal-time dictionary-compressed indexes. ACM Trans. Algorithms 17(1), 8:1-8:39 (2021)
Constantinescu, S., Ilie, L.: The Lempel-Ziv complexity of fixed points of morphisms. SIAM J. Discret. Math. 21(2), 466–481 (2007)
Damanik, D., Lenz, D.: Substitution dynamical systems: characterization of linear repetitivity and applications. J. Math. Anal. Appl. 321(2), 766–780 (2006)
Durand, F., Perrin, D.: Dimension Groups and Dynamical Systems: Substitutions, Bratteli Diagrams and Cantor Systems. Cambridge Studies in Advanced Mathematics, Cambridge University Press (2022)
Frosini, A., Mancini, I., Rinaldi, S., Romana, G., Sciortino, M.: Logarithmic equal-letter runs for BWT of purely morphic words. In: Diekert, V., Volkov, M. (eds.) Developments in Language Theory. Lecture Notes in Computer Science, vol. 13257, pp. 139–151. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-05578-2_11
Heinis, A.: Languages under substitutions and balanced words. Journal de Théorie des Nombres de Bordeaux 16, 151–172 (2004)
Kempa, D., Prezza, N.: At the roots of dictionary compression: string attractors. In: STOC 2018, pp. 827–840. ACM (2018)
Knuth, D., Morris, J., Pratt, V.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)
Kociumaka, T., Navarro, G., Prezza, N.: Towards a definitive measure of repetitiveness. In: Kohayakawa, Y., Miyazawa, F.K. (eds.) LATIN 2021. LNCS, vol. 12118, pp. 207–219. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61792-9_17
Kutsukake, K., Matsumoto, T., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M.: On repetitiveness measures of Thue-Morse words. In: Boucher, C., Thankachan, S.V. (eds.) SPIRE 2020. LNCS, vol. 12303, pp. 213–220. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59212-7_15
Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE T. Inform. Theory 22(1), 75–81 (1976)
Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press, Cambridge (2002)
Mantaci, S., Restivo, A., Sciortino, M.: Burrows-Wheeler transform and Sturmian words. Inform. Process. Lett. 86, 241–246 (2003)
Mantaci, S., Restivo, A., Romana, G., Rosone, G., Sciortino, M.: A combinatorial view on string attractors. Theor. Comput. Sci. 850, 236–248 (2021)
Navarro, G.: Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Comput. Surv. 54(2), 29:1-29:31 (2021)
Navarro, G.: Indexing highly repetitive string collections, part II: compressed indexes. ACM Comput. Surv. 54(2), 26:1-26:32 (2021)
Navarro, G.: The compression power of the BWT: technical perspective. Commun. ACM 65(6), 90 (2022)
Pansiot, J.-J.: Complexité des facteurs des mots infinis engendrés par morphismes itérés. In: Paredaens, J. (ed.) ICALP 1984. LNCS, vol. 172, pp. 380–389. Springer, Heidelberg (1984). https://doi.org/10.1007/3-540-13345-3_34
Schaeffer, L., Shallit, J.: String attractors for automatic sequences. CoRR abs/2012.06840 (2021)
Sciortino, M., Zamboni, L.Q.: Suffix automata and standard Sturmian words. In: Harju, T., Karhumäki, J., Lepistö, A. (eds.) DLT 2007. LNCS, vol. 4588, pp. 382–398. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73208-2_36
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Restivo, A., Romana, G., Sciortino, M. (2022). String Attractors and Infinite Words. In: Castañeda, A., Rodríguez-Henríquez, F. (eds) LATIN 2022: Theoretical Informatics. LATIN 2022. Lecture Notes in Computer Science, vol 13568. Springer, Cham. https://doi.org/10.1007/978-3-031-20624-5_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-20624-5_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20623-8
Online ISBN: 978-3-031-20624-5
eBook Packages: Computer ScienceComputer Science (R0)