Skip to main content

Internal Masked Prefix Sums and Its Connection to Fully Internal Measurement Queries

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13617))

Included in the following conference series:

  • 382 Accesses

Abstract

We define a generalization of the prefix sum problem in which the vector can be masked by segments of a second (Boolean) vector. This problem is shown to be related to several other prefix sum, set intersection and approximate string match problems, via specific algorithms, reductions and conditional lower bounds. To our knowledge, we are the first to consider the fully internal measurement queries and prove lower bounds for them. We also discuss the hardness of the sparse variation in both static and dynamic settings. Finally, we provide a parallel algorithm to compute the answers to all possible queries when both vectors are fixed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We will use \(\log \) to denote \(\log _2\), though as our \(\log \) all eventually end up in asymptotic notation, the constant bases are irrelevant.

References

  1. Andersson, A.: Faster deterministic sorting and searching in linear space. In 37th Annual Symposium on Foundations of Computer Science, FOCS 1996, Burlington, Vermont, USA, 14–16 October 1996, pp. 135–141. IEEE Computer Society (1996)

    Google Scholar 

  2. Bansal, N., Williams, R.: Regularity lemmas and combinatorial algorithms. Theory Comput. 8(1), 69–94 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  3. Beame, P., Fich, F.E.: Optimal bounds for the predecessor problem and related problems. J. Comput. Syst. Sci. 65(1), 38–72 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bille, P., et al.: Dynamic relative compression, dynamic partial sums, and substring concatenation. Algorithmica 80(11), 3207–3224 (2017). https://doi.org/10.1007/s00453-017-0380-7

    Article  MathSciNet  MATH  Google Scholar 

  5. Blelloch Guy, E.: Prefix sums and their applications. In: Synthesis of Parallel Algorithms, vol. 1, pp. 35–60. M. Kaufmann (1993)

    Google Scholar 

  6. Chan, T.M.: Speeding up the four Russians algorithm by about one more logarithmic factor. In: SODA, pp. 212–217 (2015)

    Google Scholar 

  7. Clifford, R., Grønlund, A., Larsen, K.G., Starikovskaya, T.: Upper and lower bounds for dynamic data structures on strings. In: Niedermeier, R., Vallée, B. (eds.) 35th Symposium on Theoretical Aspects of Computer Science, STACS 2018, 28 February–3 March 2018, Caen, France, vol. 96, pp. 22:1–22:14. LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)

    Google Scholar 

  8. Clifford, R., Iliopoulos, C.S.: Approximate string matching for music analysis. Soft. Comput. 8(9), 597–603 (2004). https://doi.org/10.1007/s00500-004-0384-5

    Article  MATH  Google Scholar 

  9. Cohen, H., Porat, E.: Fast set intersection and two-patterns matching. Theor. Comput. Sci. 411(40–42), 3795–3800 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  10. Dhulipala, L., Blelloch, G.E., Shun, J.: Theoretically efficient parallel graph algorithms can be fast and scalable. ACM Trans. Parallel Comput. 8(1), 1–70 (2021)

    Article  MathSciNet  Google Scholar 

  11. Fredman, M.L., Willard, D.E.: Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Sci. 47(3), 424–436 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  12. Goldstein, I., Lewenstein, M., Porat, E.: On the hardness of set disjointness and set intersection with bounded universe. In: Lu, P., Zhang, G. (eds.) 30th International Symposium on Algorithms and Computation (ISAAC 2019), 8–11 December 2019, Shanghai University of Finance and Economics, Shanghai, China, vol. 149, pp. 7:1–7:22. LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

    Google Scholar 

  13. Golovnev, A., Guo, S., Horel, T., Park, S., Vaikuntanathan, V.: Data structures meet cryptography: 3SUM with preprocessing. In: Makarychev, K., Makarychev, Y., Tulsiani, M., Kamath, G., Chuzhoy, J. (eds.) Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, 22–26 June 2020, pp. 294–307. ACM (2020)

    Google Scholar 

  14. Kalai, A.: Efficient pattern-matching with don’t cares. In: Eppstein, D. (ed.) Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 6–8 January 2002, San Francisco, CA, USA, pp. 655–656. ACM/SIAM (2002)

    Google Scholar 

  15. Keller, O., Kopelowitz, T., Feibish, S.L., Lewenstein, M.: Generalized substring compression. Theor. Comput. Sci. 525, 42–54 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  16. Kociumaka, T.: Efficient data structures for internal queries in texts. PhD Thesis. University of Warsaw (2019)

    Google Scholar 

  17. Kociumaka, T., Radoszewski, J., Rytter, W., Walen, T.: Internal pattern matching queries in a text and applications. In: Indyk, P. (ed.) Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, 4–6 January 2015, pp. 532–551. SIAM (2015)

    Google Scholar 

  18. Kopelowitz, T., Pettie, S., Porat, E.: Dynamic set intersection. In: Dehne, F., Sack, J.-R., Stege, U. (eds.) WADS 2015. LNCS, vol. 9214, pp. 470–481. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21840-3_39

    Chapter  Google Scholar 

  19. Kopelowitz, T., Pettie, S., Porat, E.: Higher lower bounds from the 3SUM conjecture. In: Krauthgamer, R. (ed.) Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, 10–12 January 2016, pp. 1272–1287. SIAM (2016)

    Google Scholar 

  20. Kopelowitz, T., Porat, E.: The strong 3SUM-INDEXING conjecture is false. arXiv preprint arXiv:1907.11206 (2019)

  21. Green Larsen, K.: Personal communication

    Google Scholar 

  22. Patrascu, M.: Towards polynomial lower bounds for dynamic problems. In: Schulman, L.J. (ed.) Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5–8 June 2010, pp. 603–610. ACM (2010)

    Google Scholar 

  23. Patrascu, M., Demaine, E.D.: Tight bounds for the partial-sums problem. In: Ian Munro, J. (ed.) Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, New Orleans, Louisiana, USA, 11–14 January 2004, pp. 20–29. SIAM (2004)

    Google Scholar 

  24. Pibiri, G.E., Venturini, R.: Practical trade-offs for the prefix-sum problem. Softw. Pract. Exp. 51(5), 921–949 (2021)

    Article  Google Scholar 

  25. Willard, D.E.: Log-logarithmic worst-case range queries are possible in space \(\theta \)(N). Inf. Process. Lett. 17(2), 81–84 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  26. Williams, V.V.: Multiplying matrices faster than Coppersmith-Winograd. In: STOC, pp. 887–898 (2012)

    Google Scholar 

  27. Huacheng, Yu.: An improved combinatorial algorithm for Boolean matrix multiplication. Inf. Comput. 261, 240–247 (2018)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eitan Kondratovsky .

Editor information

Editors and Affiliations

Appendices

A Details Omitted from Sect. 3

Proof of Theorem 3. Given a bit vector B of length m and an array A of length n, there is a data structure that uses \(O(\frac{mn}{f(n)}+m+n)\) words of space that can answer masked prefix sum queries in \(O(f(n)+g(n))\) time and support updates in \(O(\frac{mn\log f(n)}{g(n)f(n)} +g(n))\) time, for any functions f(n) and g(n) with \(0<f(n) < n\) and \(0< g(n) < m+n\).

Alternatively, for any \(c > 0\), there is a data structure that uses \(O(\frac{mn}{f(n)} + \frac{n^{1+c}}{c\log n}+m+n)\) words of space and can answer masked prefix sum queries in \(O(\frac{f(n)}{c\log n}+g(n))\) time and support updates in \(O(\frac{mn \log f(n)}{g(n)f(n)}+\frac{n^{1+c}}{g(n)}+g(n))\) time. If \(m = O(n)\), setting \(f(n) = n^{2/3}\log n\), \(g(n) = n^{2/3}\) and \(c = 1/3\) yields an \(O(n^{4/3}/\log n)\)-word data structure with \(O(n^{2/3})\) query and update times.

Proof

We first present a data structure with amortized bounds on update operations. The main idea is to rebuild the data structures from Theorem 1 every g(n) updates. Since Theorem 1 presents multiple trade-offs, in the rest of the proof, we use s(mn), p(mn) and q(n) to represent the space cost, preprocessing time and query time of the data structures in that theorem. Before a rebuilding is triggered, we maintain two copies of the array and the bit mask: A and B store the current content of this array and the bit mask, respectively, while \(A'\) and \(B'\) store their content when the previous rebuilding happened. Thus, the data structure, D, constructed in the previous rebuilding, can be used to answer masked prefix sum queries over \(A'\) and \(B'\). For the updates arrived after the previous rebuilding, we maintain two lists: a list \(L_A\) that stores a sorted list of the indexes of the entries of A that have been updated since the previous rebuilding, and a list \(L_B\) that stores a sorted list of the indexes of the entries of B that have been updated since the previous rebuilding. Since the length of either list is at most \(g(n) < m+n\), all the data structures occupy \(O(s(m,n) + m + n)\) words.

We then answer a masked prefix sum query as follows. Let k and i be the parameters of the query, i.e., we aim at computing \(\sum _{j=1}^{k} A[j]\cdot B[i+j-1]\). We first perform such a query using D in q(n) time and get what the answer would be if there had been no updates since the last rebuilding. Since both \(L_A\) and \(L_B\) are sorted, we can walk through them to compute the indexes of the elements of A that have either been updated since the last rebuilding, or it is mapped by the query to a bit in B that has been updated since the last rebuilding. This uses O(g(n)) time. Then, for each such index d, we consult A, \(A'\), B and \(B'\) to compute how much the update, to either A[d] or \(B[d + i-1]\), affects the answer to the query compared to the answer given by D. This again requires O(g(n)) time over all these indexes. This entire process then answers a query in \(O(q(n)+g(n))\) time.

For each update, it requires O(1) time to keep A and B up-to-date. It also requires an update to the sorted list \(L_A\) or \(L_B\), which can be done in O(g(n)) time. Finally, since the rebuilding requires O(p(mn)) time and it is done every g(n) updates, the amortized cost of each update is then \(O(p(m,n)/g(n) + g(n))\).

The bounds in this theorem thus follows from the specific bounds on s(mn), p(mn) and q(n) in Theorem 1.

Finally, to deamortize using the global rebuilding approach, instead of rebuilding this data structure entirely during the update operation that triggers the rebuilding, we rebuild it over the next g(n) updates. This requires us to create two additional lists \(L_A'\) and \(L_B'\): Each time a rebuilding starts, we rename \(L_A\) and \(L_B\) to \(L_A'\) and \(L_B'\), and create new empty lists \(L_A\) and \(L_B\) to maintain indexes of the updates that arrive after the rebuilding starts. To answer a query, we cannot use the data structure that is currently being rebuilt since it is not complete, but we use the previous version of it and consult \(L_A\), \(L_B\), \(L_A'\) and \(L_B'\) to compute the answer using ideas similar to those described in previous paragraphs. \(\square \)

B Details Omitted from Sect. 5

Proof of Lemma 5. Assume a constant-size alphabet \(\varSigma \). Then, there is a linear-time reductions from the InternalHammingDistance to the internal inner product problem, and vice versa. Moreover, there is a linear-time reductions from the InternalEMWW problem to the internal inner product problem, and vice versa.

Proof

The reduction from the InternalHammingDistance to the internal inner product. For each letter \(\sigma \in \varSigma \), we change S and T to be bit vectors: \(\sigma \) in T become 1 and \(\varSigma \setminus \{\sigma \}\) become 0, while in S, \(\sigma \) become 0 and \(\varSigma \setminus \{\sigma \}\) become 1. That is, the Hamming distance query sums a constant number of internal inner products in order to answer the query.

The reduction from the internal inner product problem to the InternalHammingDistance. Assume we have two bit vectors A and B. Every 1 in A is transferred to 001, and 0 to 010, while in B, each 1 is transferred to 001, and 0 to 100. Let S and T be the transformed strings from A and B, respectively. It is easy to see that only 1 against 1 in A against B causes 0 mismatches between the corresponding substrings of S and T and any of the other 3 combinations results in 2 mismatches. Where corresponding substrings means that the starting and ending positions of the substrings are chosen to fit the original query, i.e. by multiplying the query indices by 3. Note that this reduction transfers the internal inner product to the InternalEMWW, as well.

The reduction from the InternalEMWW problem to the internal inner product. In a similar way, the inner product solves the exact matching with wildcards problem. We repeat the same process as described previously for Hamming distance but this time, wildcards are always transferred to 0 in both S and T. It is easy to see that when the sum over all the inner products is 0, there is an exact match with wildcards.

\(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Das, R., He, M., Kondratovsky, E., Munro, J.I., Wu, K. (2022). Internal Masked Prefix Sums and Its Connection to Fully Internal Measurement Queries. In: Arroyuelo, D., Poblete, B. (eds) String Processing and Information Retrieval. SPIRE 2022. Lecture Notes in Computer Science, vol 13617. Springer, Cham. https://doi.org/10.1007/978-3-031-20643-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20643-6_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20642-9

  • Online ISBN: 978-3-031-20643-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics