Internal Masked Prefix Sums and Its Connection to Fully Internal Measurement Queries

Das, Rathish; He, Meng; Kondratovsky, Eitan; Munro, J. Ian; Wu, Kaiyu

doi:10.1007/978-3-031-20643-6_16

Rathish Das⁹,
Meng He¹⁰,
Eitan Kondratovsky⁹,
J. Ian Munro⁹ &
…
Kaiyu Wu⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13617))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

382 Accesses

Abstract

We define a generalization of the prefix sum problem in which the vector can be masked by segments of a second (Boolean) vector. This problem is shown to be related to several other prefix sum, set intersection and approximate string match problems, via specific algorithms, reductions and conditional lower bounds. To our knowledge, we are the first to consider the fully internal measurement queries and prove lower bounds for them. We also discuss the hardness of the sparse variation in both static and dynamic settings. Finally, we provide a parallel algorithm to compute the answers to all possible queries when both vectors are fixed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We will use \(\log \) to denote \(\log _2\), though as our \(\log \) all eventually end up in asymptotic notation, the constant bases are irrelevant.

References

Andersson, A.: Faster deterministic sorting and searching in linear space. In 37th Annual Symposium on Foundations of Computer Science, FOCS 1996, Burlington, Vermont, USA, 14–16 October 1996, pp. 135–141. IEEE Computer Society (1996)
Google Scholar
Bansal, N., Williams, R.: Regularity lemmas and combinatorial algorithms. Theory Comput. 8(1), 69–94 (2012)
Article MathSciNet MATH Google Scholar
Beame, P., Fich, F.E.: Optimal bounds for the predecessor problem and related problems. J. Comput. Syst. Sci. 65(1), 38–72 (2002)
Article MathSciNet MATH Google Scholar
Bille, P., et al.: Dynamic relative compression, dynamic partial sums, and substring concatenation. Algorithmica 80(11), 3207–3224 (2017). https://doi.org/10.1007/s00453-017-0380-7
Article MathSciNet MATH Google Scholar
Blelloch Guy, E.: Prefix sums and their applications. In: Synthesis of Parallel Algorithms, vol. 1, pp. 35–60. M. Kaufmann (1993)
Google Scholar
Chan, T.M.: Speeding up the four Russians algorithm by about one more logarithmic factor. In: SODA, pp. 212–217 (2015)
Google Scholar
Clifford, R., Grønlund, A., Larsen, K.G., Starikovskaya, T.: Upper and lower bounds for dynamic data structures on strings. In: Niedermeier, R., Vallée, B. (eds.) 35th Symposium on Theoretical Aspects of Computer Science, STACS 2018, 28 February–3 March 2018, Caen, France, vol. 96, pp. 22:1–22:14. LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)
Google Scholar
Clifford, R., Iliopoulos, C.S.: Approximate string matching for music analysis. Soft. Comput. 8(9), 597–603 (2004). https://doi.org/10.1007/s00500-004-0384-5
Article MATH Google Scholar
Cohen, H., Porat, E.: Fast set intersection and two-patterns matching. Theor. Comput. Sci. 411(40–42), 3795–3800 (2010)
Article MathSciNet MATH Google Scholar
Dhulipala, L., Blelloch, G.E., Shun, J.: Theoretically efficient parallel graph algorithms can be fast and scalable. ACM Trans. Parallel Comput. 8(1), 1–70 (2021)
Article MathSciNet Google Scholar
Fredman, M.L., Willard, D.E.: Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Sci. 47(3), 424–436 (1993)
Article MathSciNet MATH Google Scholar
Goldstein, I., Lewenstein, M., Porat, E.: On the hardness of set disjointness and set intersection with bounded universe. In: Lu, P., Zhang, G. (eds.) 30th International Symposium on Algorithms and Computation (ISAAC 2019), 8–11 December 2019, Shanghai University of Finance and Economics, Shanghai, China, vol. 149, pp. 7:1–7:22. LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)
Google Scholar
Golovnev, A., Guo, S., Horel, T., Park, S., Vaikuntanathan, V.: Data structures meet cryptography: 3SUM with preprocessing. In: Makarychev, K., Makarychev, Y., Tulsiani, M., Kamath, G., Chuzhoy, J. (eds.) Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, 22–26 June 2020, pp. 294–307. ACM (2020)
Google Scholar
Kalai, A.: Efficient pattern-matching with don’t cares. In: Eppstein, D. (ed.) Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 6–8 January 2002, San Francisco, CA, USA, pp. 655–656. ACM/SIAM (2002)
Google Scholar
Keller, O., Kopelowitz, T., Feibish, S.L., Lewenstein, M.: Generalized substring compression. Theor. Comput. Sci. 525, 42–54 (2014)
Article MathSciNet MATH Google Scholar
Kociumaka, T.: Efficient data structures for internal queries in texts. PhD Thesis. University of Warsaw (2019)
Google Scholar
Kociumaka, T., Radoszewski, J., Rytter, W., Walen, T.: Internal pattern matching queries in a text and applications. In: Indyk, P. (ed.) Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, 4–6 January 2015, pp. 532–551. SIAM (2015)
Google Scholar
Kopelowitz, T., Pettie, S., Porat, E.: Dynamic set intersection. In: Dehne, F., Sack, J.-R., Stege, U. (eds.) WADS 2015. LNCS, vol. 9214, pp. 470–481. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21840-3_39
Chapter Google Scholar
Kopelowitz, T., Pettie, S., Porat, E.: Higher lower bounds from the 3SUM conjecture. In: Krauthgamer, R. (ed.) Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, 10–12 January 2016, pp. 1272–1287. SIAM (2016)
Google Scholar
Kopelowitz, T., Porat, E.: The strong 3SUM-INDEXING conjecture is false. arXiv preprint arXiv:1907.11206 (2019)
Green Larsen, K.: Personal communication
Google Scholar
Patrascu, M.: Towards polynomial lower bounds for dynamic problems. In: Schulman, L.J. (ed.) Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5–8 June 2010, pp. 603–610. ACM (2010)
Google Scholar
Patrascu, M., Demaine, E.D.: Tight bounds for the partial-sums problem. In: Ian Munro, J. (ed.) Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, New Orleans, Louisiana, USA, 11–14 January 2004, pp. 20–29. SIAM (2004)
Google Scholar
Pibiri, G.E., Venturini, R.: Practical trade-offs for the prefix-sum problem. Softw. Pract. Exp. 51(5), 921–949 (2021)
Article Google Scholar
Willard, D.E.: Log-logarithmic worst-case range queries are possible in space \(\theta \)(N). Inf. Process. Lett. 17(2), 81–84 (1983)
Article MathSciNet MATH Google Scholar
Williams, V.V.: Multiplying matrices faster than Coppersmith-Winograd. In: STOC, pp. 887–898 (2012)
Google Scholar
Huacheng, Yu.: An improved combinatorial algorithm for Boolean matrix multiplication. Inf. Comput. 261, 240–247 (2018)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada
Rathish Das, Eitan Kondratovsky, J. Ian Munro & Kaiyu Wu
Faculty of Computer Science, Dalhousie University, Halifax, Canada
Meng He

Authors

Rathish Das
View author publications
You can also search for this author in PubMed Google Scholar
Meng He
View author publications
You can also search for this author in PubMed Google Scholar
Eitan Kondratovsky
View author publications
You can also search for this author in PubMed Google Scholar
J. Ian Munro
View author publications
You can also search for this author in PubMed Google Scholar
Kaiyu Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eitan Kondratovsky .

Editor information

Editors and Affiliations

Universidad Técnica Federico Santa María, Valparaíso, Chile
Diego Arroyuelo
Universidad de Chile, Santiago, Chile
Barbara Poblete

Appendices

A Details Omitted from Sect. 3

Proof of Theorem 3. Given a bit vector B of length m and an array A of length n, there is a data structure that uses \(O(\frac{mn}{f(n)}+m+n)\) words of space that can answer masked prefix sum queries in \(O(f(n)+g(n))\) time and support updates in \(O(\frac{mn\log f(n)}{g(n)f(n)} +g(n))\) time, for any functions f(n) and g(n) with \(0<f(n) < n\) and \(0< g(n) < m+n\).

Alternatively, for any \(c > 0\), there is a data structure that uses \(O(\frac{mn}{f(n)} + \frac{n^{1+c}}{c\log n}+m+n)\) words of space and can answer masked prefix sum queries in \(O(\frac{f(n)}{c\log n}+g(n))\) time and support updates in \(O(\frac{mn \log f(n)}{g(n)f(n)}+\frac{n^{1+c}}{g(n)}+g(n))\) time. If \(m = O(n)\), setting \(f(n) = n^{2/3}\log n\), \(g(n) = n^{2/3}\) and \(c = 1/3\) yields an \(O(n^{4/3}/\log n)\)-word data structure with \(O(n^{2/3})\) query and update times.

Proof

We first present a data structure with amortized bounds on update operations. The main idea is to rebuild the data structures from Theorem 1 every g(n) updates. Since Theorem 1 presents multiple trade-offs, in the rest of the proof, we use s(m, n), p(m, n) and q(n) to represent the space cost, preprocessing time and query time of the data structures in that theorem. Before a rebuilding is triggered, we maintain two copies of the array and the bit mask: A and B store the current content of this array and the bit mask, respectively, while \(A'\) and \(B'\) store their content when the previous rebuilding happened. Thus, the data structure, D, constructed in the previous rebuilding, can be used to answer masked prefix sum queries over \(A'\) and \(B'\). For the updates arrived after the previous rebuilding, we maintain two lists: a list \(L_A\) that stores a sorted list of the indexes of the entries of A that have been updated since the previous rebuilding, and a list \(L_B\) that stores a sorted list of the indexes of the entries of B that have been updated since the previous rebuilding. Since the length of either list is at most \(g(n) < m+n\), all the data structures occupy \(O(s(m,n) + m + n)\) words.

We then answer a masked prefix sum query as follows. Let k and i be the parameters of the query, i.e., we aim at computing \(\sum _{j=1}^{k} A[j]\cdot B[i+j-1]\). We first perform such a query using D in q(n) time and get what the answer would be if there had been no updates since the last rebuilding. Since both \(L_A\) and \(L_B\) are sorted, we can walk through them to compute the indexes of the elements of A that have either been updated since the last rebuilding, or it is mapped by the query to a bit in B that has been updated since the last rebuilding. This uses O(g(n)) time. Then, for each such index d, we consult A, \(A'\), B and \(B'\) to compute how much the update, to either A[d] or \(B[d + i-1]\), affects the answer to the query compared to the answer given by D. This again requires O(g(n)) time over all these indexes. This entire process then answers a query in \(O(q(n)+g(n))\) time.

For each update, it requires O(1) time to keep A and B up-to-date. It also requires an update to the sorted list \(L_A\) or \(L_B\), which can be done in O(g(n)) time. Finally, since the rebuilding requires O(p(m, n)) time and it is done every g(n) updates, the amortized cost of each update is then \(O(p(m,n)/g(n) + g(n))\).

The bounds in this theorem thus follows from the specific bounds on s(m, n), p(m, n) and q(n) in Theorem 1.

Finally, to deamortize using the global rebuilding approach, instead of rebuilding this data structure entirely during the update operation that triggers the rebuilding, we rebuild it over the next g(n) updates. This requires us to create two additional lists \(L_A'\) and \(L_B'\): Each time a rebuilding starts, we rename \(L_A\) and \(L_B\) to \(L_A'\) and \(L_B'\), and create new empty lists \(L_A\) and \(L_B\) to maintain indexes of the updates that arrive after the rebuilding starts. To answer a query, we cannot use the data structure that is currently being rebuilt since it is not complete, but we use the previous version of it and consult \(L_A\), \(L_B\), \(L_A'\) and \(L_B'\) to compute the answer using ideas similar to those described in previous paragraphs. \(\square \)

B Details Omitted from Sect. 5

Proof of Lemma 5. Assume a constant-size alphabet \(\varSigma \). Then, there is a linear-time reductions from the InternalHammingDistance to the internal inner product problem, and vice versa. Moreover, there is a linear-time reductions from the InternalEMWW problem to the internal inner product problem, and vice versa.

Proof

The reduction from the InternalHammingDistance to the internal inner product. For each letter \(\sigma \in \varSigma \), we change S and T to be bit vectors: \(\sigma \) in T become 1 and \(\varSigma \setminus \{\sigma \}\) become 0, while in S, \(\sigma \) become 0 and \(\varSigma \setminus \{\sigma \}\) become 1. That is, the Hamming distance query sums a constant number of internal inner products in order to answer the query.

The reduction from the internal inner product problem to the InternalHammingDistance. Assume we have two bit vectors A and B. Every 1 in A is transferred to 001, and 0 to 010, while in B, each 1 is transferred to 001, and 0 to 100. Let S and T be the transformed strings from A and B, respectively. It is easy to see that only 1 against 1 in A against B causes 0 mismatches between the corresponding substrings of S and T and any of the other 3 combinations results in 2 mismatches. Where corresponding substrings means that the starting and ending positions of the substrings are chosen to fit the original query, i.e. by multiplying the query indices by 3. Note that this reduction transfers the internal inner product to the InternalEMWW, as well.

The reduction from the InternalEMWW problem to the internal inner product. In a similar way, the inner product solves the exact matching with wildcards problem. We repeat the same process as described previously for Hamming distance but this time, wildcards are always transferred to 0 in both S and T. It is easy to see that when the sum over all the inner products is 0, there is an exact match with wildcards.

\(\square \)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Das, R., He, M., Kondratovsky, E., Munro, J.I., Wu, K. (2022). Internal Masked Prefix Sums and Its Connection to Fully Internal Measurement Queries. In: Arroyuelo, D., Poblete, B. (eds) String Processing and Information Retrieval. SPIRE 2022. Lecture Notes in Computer Science, vol 13617. Springer, Cham. https://doi.org/10.1007/978-3-031-20643-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-20643-6_16
Published: 01 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20642-9
Online ISBN: 978-3-031-20643-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Internal Masked Prefix Sums and Its Connection to Fully Internal Measurement Queries

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Details Omitted from Sect. 3

Proof

B Details Omitted from Sect. 5

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation