Abstract
It was recently proved that any SLP generating a given string w can be transformed in linear time into an equivalent balanced SLP of the same asymptotic size. We show that this result also holds for RLSLPs, which are SLPs extended with run-length rules of the form \(A \rightarrow B^t\) for \(t>2\), deriving \(\texttt {exp}(A) = \texttt {exp}(B)^t\). An immediate consequence is the simplification of the algorithm for extracting substrings of an RLSLP-compressed string. We also show that several problems like answering RMQs and computing Karp-Rabin fingerprints on substrings can be solved in \(\mathcal {O}(g_{rl})\) space and \(\mathcal {O}(\log n)\) time, \(g_{rl}\) being the size of the smallest RLSLP generating the string, of length n. We extend the result to solving more general operations on string ranges, in \(\mathcal {O}(g_{rl})\) space and \(\mathcal {O}(\log n)\) applications of the operation. In general, the smallest RLSLP can be asymptotically smaller than the smallest SLP by up to an \(\mathcal {O}(\log n)\) factor, so our results can make a difference in terms of the space needed for computing these operations efficiently for some string families.
Funded in part by Basal Funds FB0001, Fondecyt Grant 1-200038, and two Conicyt Doctoral Scholarships, ANID, Chile.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Seen another way, \(\lambda (A) \not = \lambda (B)\) because \(\log _2 \pi (A,W) = \log _2 (t \cdot \pi (B,W)) > 1 + \log _2 \pi (B,W)\).
References
Bille, P., Gørtz, I.L., Cording, P.H., Sach, B., Vildhøj, H.W., Vind, S.: Fingerprints in compressed strings. J. Comput. Syst. Sci. 86, 171–180 (2017). https://doi.org/10.1016/j.jcss.2017.01.002, https://www.sciencedirect.com/science/article/pii/S0022000017300028
Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings and trees. SIAM J. Comput. 44(3), 513–539 (2015). https://doi.org/10.1137/130936889
Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Tech. report, DIGITAL SRC RESEARCH REPORT (1994)
Charikar, M., et al.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)
Christiansen, A., Ettienne, M., Kociumaka, T., Navarro, G., Prezza, N.: Optimal-time dictionary-compressed indexes. ACM Trans. Algorithms 17, 1–39 (2020). https://doi.org/10.1145/3426473
Fischer, J., Mäkinen, V., Navarro, G.: Faster entropy-bounded compressed suffix trees. Theoret. Comput. Sci. 410(51), 5354–5364 (2009)
Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011). https://doi.org/10.1137/090779759
Gagie, T., Navarro, G., Prezza, N.: Optimal-time text indexing in BWT-runs bounded space. In: Proceedings 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1459–1477 (2018)
Gagie, T., Navarro, G., Prezza, N.: Fully functional suffix trees and optimal text searching in BWT-runs bounded space. J. ACM 67(1), 1–54 (2020). https://doi.org/10.1145/3375890
Ganardi, M., Jeż, A., Lohrey, M.: Balancing straight-line programs. J. ACM 68(4), 1–40 (2021). https://doi.org/10.1145/3457389
Jeż, A.: Approximation of grammar-based compression via recompression. Theoret. Comput. Sci. 592, 115–134 (2015)
Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987). https://doi.org/10.1147/rd.312.0249
Kempa, D., Kociumaka, T.: Resolution of the burrows-wheeler transform conjecture. Commun. ACM 65(6), 91–98 (2022). https://doi.org/10.1145/3531445
Kempa, D., Prezza, N.: At the roots of dictionary compression: string attractors. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (2018). https://doi.org/10.1145/3188745.3188814
Kini, D., Mathur, U., Viswanathan, M.: Data race detection on compressed traces. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 26–37. ESEC/FSE 2018, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3236024.3236025
Kreft, S., Navarro, G.: Lz77-like compression with fast random access. In: 2010 Data Compression Conference, pp. 239–248 (2010)
Larsson, N., Moffat, A.: Offline dictionary-based compression. In: Proceedings DCC 1999 Data Compression Conference (Cat. No. PR00096), pp. 296–305 (1999)
Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Inf. Theory 22(1), 75–81 (1976)
Navarro, G.: Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Comput. Surv. 54(2), article 29 (2021)
Nevill-Manning, C.G., Witten, I.H.: Identifying hierarchical structure in sequences: a linear-time algorithm. J. Artif. Intell. Res. 7(1), 67–82 (1997)
Nishimoto, T., Inenaga, S., Bannai, H., Takeda, M.: Fully dynamic data structure for LCE queries in compressed space. In: 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), vol. 58, pp. 72:1–72:15 (2016)
Przeworski, M., Hudson, R., Di Rienzo, A.: Adjusting the focus on human variation. Trends Genetics: TIG 16(7), 296–302 (2000)
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoret. Comput. Sci. 302(1), 211–222 (2003)
Verbin, E., Yu, W.: Data structure lower bounds on random access to grammar-compressed strings. In: Proceedings 24th Annual Symposium on Combinatorial Pattern Matching (CPM), pp. 247–258 (2013)
Zhang, M., Mathur, U., Viswanathan, M.: Checking LTL[F, G, X] on compressed traces in polynomial time. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 131–143. ESEC/FSE 2021, Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3468264.3468557
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A PSV and NSV Queries
A PSV and NSV Queries
Other relevant queries are previous smaller value (PSV) and next smaller value (NSV) [6, 9], defined as follows:
-
\(\texttt {psv}(i)= \texttt {max}(\{j \,|\, j< i, w[j] < w[i]\}\cup \{0\})\)
-
\(\texttt {nsv}(i) = \texttt {min}(\{j \,|\, j > i, w[j] < w[i]\}\cup \{n+1\})\)
-
\(\texttt {psv}'(i, d)= \texttt {max}(\{j \,|\, j< i, w[j] < d\}\cup \{0\})\)
-
\(\texttt {nsv}'(i, d) = \texttt {min}(\{j \,|\, j > i, w[j] < d\}\cup \{n+1\})\)
Note that the first two queries can be computed by accessing w[i] in \(\mathcal {O}(\log n)\) time, and then calling one of the latter two queries, respectively. We show that the latter queries can be answered in \(\mathcal {O}(g_{rl})\) space and \(\mathcal {O}(\log n)\) time.
Theorem 5
It is possible to construct an index of size \(\mathcal {O}(g_{rl})\) supporting PSV and NSV queries in \(\mathcal {O}(\log n)\) time.
Proof
Let G be a balanced RLSLP of size \(\mathcal {O}(g_{rl})\) constructed as in Theorem 1. Store the values \(L[A] = |\texttt {exp}(A)|\) and \(M[A] = \texttt {min}(\{\texttt {exp}(A)[i]\,|\, i \in [1.. L[A]]\})\), for every variable A, as arrays. These arrays add only \(\mathcal {O}(g_{rl})\) extra space. To compute \(\texttt {psv}'(A, i, d)\), do as follows:
-
1.
If \(i=1\) or \(M[A] \ge d\), return 0.
-
2.
If \(A \rightarrow a\), return 1.
-
3.
If \(A \rightarrow BC\), then:
-
(a)
If \(i \le L[B]+1\), return \(\texttt {psv}'(B, i, d)\).
-
(b)
If \(L[B]+1 < i\), let \(k = \texttt {psv}'(C, i - L[B], d)\). If \(k > 0\), return \(L[B] + k\), otherwise, return \(\texttt {psv}'(B, i, d)\).
-
(a)
-
4.
If \(A \rightarrow B^t\) for \(t > 2\), then:
-
(a)
If \(i \le L[B]+1\), return \(\texttt {psv}'(B, i, d)\).
-
(b)
If \(i \in [t'L[B] +1..(t'+1)L[B]]\), let \(k = \texttt {psv}'(B, i - t'L[B], d)\). If \(k > 0\), return \(t'L[B] + k\). Otherwise, return \((t'-1)L[B] + \texttt {psv}'(B,i,d)\).
-
(c)
If \(L[A]<i\), return \((t-1)L[B]+\texttt {psv}'(B,i,d)\).
-
(a)
The guard in point 1 guarantees that, in the simple case where i is beyond \(|\texttt {exp}(A)|\), at most one recursive call needs more than \(\mathcal {O}(1)\) time. In general, we can make two calls in case 3(b), but then the second call (inside B) is of the simple type from there on. The case of run-length rules is similar. Thus, we obtain \(\mathcal {O}(\log n)\) time. The query \(\texttt {nsv}'\) is handled similarly. \(\square \)
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Navarro, G., Olivares, F., Urbina, C. (2022). Balancing Run-Length Straight-Line Programs. In: Arroyuelo, D., Poblete, B. (eds) String Processing and Information Retrieval. SPIRE 2022. Lecture Notes in Computer Science, vol 13617. Springer, Cham. https://doi.org/10.1007/978-3-031-20643-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-20643-6_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20642-9
Online ISBN: 978-3-031-20643-6
eBook Packages: Computer ScienceComputer Science (R0)