Abstract
Compact directed acyclic word graphs (CDAWGs) [Blumer et al. 1987] are a fundamental data structure on strings with applications in text pattern searching, data compression, and pattern discovery. Intuitively, the CDAWG of a string T is obtained by merging isomorphic subtrees of the suffix tree [Weiner 1973] of the same string T, thus CDAWGs are a compact indexing structure. In this paper, we investigate the sensitivity of CDAWGs when a single character edit operation (insertion, deletion, or substitution) is performed at the left-end of the input string T, namely, we are interested in the worst-case increase in the size of the CDAWG after a left-end edit operation. We prove that if \(\textsf{e}\) is the number of edges of the CDAWG for string T, then the number of new edges added to the CDAWG after a left-end edit operation on T is less than \(\textsf{e}\). Further, we present almost matching lower bounds on the sensitivity of CDAWGs for all cases of insertion, deletion, and substitution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akagi, T., Funakoshi, M., Inenaga, S.: Sensitivity of string compressors and repetitiveness measures. Inf. Comput. 291, 104999 (2023). https://doi.org/10.1016/j.ic.2022.104999
Belazzougui, D., Cunial, F.: Fast label extraction in the CDAWG. In: SPIRE 2017, pp. 161–175 (2017). https://doi.org/10.1007/978-3-319-67428-5_14
Belazzougui, D., Cunial, F., Gagie, T., Prezza, N., Raffinot, M.: Composite repetition-aware data structures. In: CPM 2015, pp. 26–39 (2015). https://doi.org/10.1007/978-3-319-19929-0_3
Blumer, A., Blumer, J., Haussler, D., McConnell, R., Ehrenfeucht, A.: Complete inverted files for efficient text retrieval and analysis. J. ACM 34(3), 578–595 (1987). https://doi.org/10.1145/28869.28873
Burrows, M., Wheeler, D.J.: A block sorting lossless data compression algorithm. Tech. Rep. 124, Digital Equipment Corporation (1994)
Crochemore, M., Vérin, R.: On compact directed acyclic word graphs. In: Mycielski, Jan, Rozenberg, Grzegorz, Salomaa, Arto (eds.) Structures in Logic and Computer Science. LNCS, vol. 1261, pp. 192–211. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63246-8_12
Fujimaru, H., Nakashima, Y., Inenaga, S.: On sensitivity of compact directed acyclic word graphs. https://doi.org/10.48550/arXiv.2303.01726 CoRR abs/ arXiv: 2303.01726 (2023)
Inenaga, S., et al.: On-line construction of compact directed acyclic word graphs. Discret. Appl. Math. 146(2), 156–179 (2005). https://doi.org/10.1016/j.dam.2004.04.012
Kempa, D., Prezza, N.: At the roots of dictionary compression: string attractors. In: STOC 2018, pp. 827–840 (2018). https://doi.org/10.1145/3188745.3188814
Kociumaka, T., Navarro, G., Prezza, N.: Towards a definitive measure of repetitiveness. In: Kohayakawa, Y., Miyazawa, F.K. (eds.) LATIN 2021. LNCS, vol. 12118, pp. 207–219. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61792-9_17
Radoszewski, J., Rytter, W.: On the structure of compacted subword graphs of Thue-Morse words and their applications. J. Dis. Algorithms 11, 15–24 (2012). https://doi.org/10.1016/j.jda.2011.01.001
Senft, M., Dvorák, T.: Sliding CDAWG perfection. In: SPIRE 2008, pp. 109–120 (2008). https://doi.org/10.1007/978-3-540-89097-3_12
Takagi, T., Goto, K., Fujishige, Y., Inenaga, S., Arimura, H.: Linear-size CDAWG: New repetition-aware indexing and grammar compression. In: SPIRE 2017, pp. 304–316 (2017). https://doi.org/10.1007/978-3-319-67428-5_26
Takeda, M., Matsumoto, T., Fukuda, T., Nanri, I.: Discovering characteristic expressions from literary works: a new text analysis method beyond n-gram statistics and KWIC. In: Discovery Science 2000, pp. 112–126 (2000). https://doi.org/10.1007/3-540-44418-1_10
Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th Annual Symposium on Switching and Automata Theory, pp. 1–11. IEEE (1973). https://doi.org/10.1109/SWAT.1973.13
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977). https://doi.org/10.1109/TIT.1977.1055714
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Fujimaru, H., Nakashima, Y., Inenaga, S. (2023). On Sensitivity of Compact Directed Acyclic Word Graphs. In: Frid, A., Mercaş, R. (eds) Combinatorics on Words. WORDS 2023. Lecture Notes in Computer Science, vol 13899. Springer, Cham. https://doi.org/10.1007/978-3-031-33180-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-33180-0_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33179-4
Online ISBN: 978-3-031-33180-0
eBook Packages: Computer ScienceComputer Science (R0)