Abstract
In their ground-breaking paper, Indyk and Woodruff (STOC 05) showed how to compute the k-th frequency moment F k (for k > 2) in space O(poly − log(n,m) · n\(^{1-{2} \over{k}})\), giving the first optimal result up to poly-logarithmic factors in n and m (here m is the length of the stream and n is the size of the domain.) The method of Indyk and Woodruff reduces the problem of F k to the problem of computing heavy hitters in the streaming manner. Their reduction only requires polylogarithmic overhead in term of the space complexity and is based on the fundamental idea of “layering.” Since 2005 the method of Indyk and Woodruff has been used in numerous applications and has become a standard tool for streaming computations.
We propose a new recursive sketch that generalizes and improves the reduction of Indyk and Woodruff. Our method works for any non-negative frequency-based function in several models, including the insertion-only model, the turnstile model and the sliding window model. For frequency-based functions with sublinear polynomial space complexity our reduction only requires log(c)(n) overhead, where log(c)(n) is the iterative log function. Thus, we improve the reduction of Indyk and Woodruff by polylogarithmic factor. We illustrate the generality of our method by several applications: frequency moments, frequency based functions, spatial data streams and measuring independence of data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58(1), 137–147 (1999)
Andoni, A., Krauthgamer, R., Onak, K.: Streaming algorithms via precision sampling. In: FOCS, pp. 363–372 (2011)
Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D.: An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci. 68(4), 702–732 (2004)
Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L.: Counting distinct elements in a data stream. In: Rolim, J.D.P., Vadhan, S.P. (eds.) RANDOM 2002. LNCS, vol. 2483, pp. 1–10. Springer, Heidelberg (2002)
Beame, P., Jayram, T.S., Rudra, A.: Lower bounds for randomized read/write stream algorithms. In: STOC 2007: Proceedings of the Thirty-ninth Annual ACM Symposium on Theory of Computing, pp. 689–698. ACM, New York (2007)
Bhuvanagiri, L., Ganguly, S., Kesh, D., Saha, C.: Simpler algorithm for estimating frequency moments of data streams. In: SODA, pp. 708–713 (2006)
Braverman, V., Gelles, R., Ostrovsky, R.: How to catch l 2-heavy-hitters on sliding windows. In: Du, D.-Z., Zhang, G. (eds.) COCOON 2013. LNCS, vol. 7936, pp. 638–650. Springer, Heidelberg (2013)
Braverman, V., Ostrovsky, R.: Generalizing the layering method of indyk and woodruff: Recursive sketches for frequency-based vectors on streams. In: Raghavendra, P., Raskhodnikova, S., Jansen, K., Rolim, J.D.P. (eds.) APPROX/RANDOM 2013. LNCS, vol. 8096, pp. 58–70. Springer, Heidelberg (2013)
Braverman, V., Ostrovsky, R.: Smooth histograms for sliding windows. In: Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2007, pp. 283–293. IEEE Computer Society, Washington, DC (2007)
Braverman, V., Ostrovsky, R.: Measuring independence of datasets. In: Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, pp. 271–280. ACM, New York (2010)
Braverman, V., Ostrovsky, R.: Recursive sketching for frequency moments. CoRR, abs/1011.2571 (2010)
Braverman, V., Ostrovsky, R.: Zero-one frequency laws. In: Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, pp. 281–290. ACM, New York (2010)
Chakrabarti, A., Cormode, G., McGregor, A.: Robust lower bounds for communication and stream computation. In: STOC 2008: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, pp. 641–650. ACM, New York (2008)
Chakrabarti, A., Khot, S., Sun, X.: Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In: IEEE Conference on Computational Complexity, pp. 107–117 (2003)
Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)
Coppersmith, D., Kumar, R.: An improved data stream algorithm for frequency moments. In: SODA, pp. 151–156 (2004)
Cormode, G., Datar, M., Indyk, P., Muthukrishnan, S.: Comparing data streams using hamming norms (how to zero in). IEEE Trans. on Knowl. and Data Eng. 15(3), 529–540 (2003)
Feigenbaum, J., Kannan, S., Strauss, M., Viswanathan, M.: An approximate l1-difference algorithm for massive data streams. In: FOCS 1999: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, p. 501. IEEE Computer Society, Washington, DC (1999)
Flajolet, P., Nigel Martin, G.: Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985)
Ganguly, S.: Estimating frequency moments of data streams using random linear combinations. In: Jansen, K., Khanna, S., Rolim, J.D.P., Ron, D. (eds.) APPROX and RANDOM 2004. LNCS, vol. 3122, pp. 369–380. Springer, Heidelberg (2004)
Ganguly, S., Cormode, G.: On estimating frequency moments of data streams. In: Charikar, M., Jansen, K., Reingold, O., Rolim, J.D.P. (eds.) APPROX and RANDOM 2007. LNCS, vol. 4627, pp. 479–493. Springer, Heidelberg (2007)
Indyk, P.: Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM 53(3), 307–323 (2006)
Indyk, P., Woodruff, D.L.P.: Optimal approximations of the frequency moments of data streams. In: STOC 2005: Proceedings of the Thirty-seventh Annual ACM Symposium on Theory of Computing, pp. 202–208. ACM, New York (2005)
Jayram, T.S., McGregor, A., Muthukrishnan, S., Vee, E.: Estimating statistical aggregates on probabilistic data streams. In: PODS 2007: Proceedings of the Twenty-sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 243–252. ACM, New York (2007)
Kane, D.M., Nelson, J., Woodruff, D.P.: On the exact space complexity of sketching and streaming small norms. In: Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010 (2010)
Kane, D.M., Nelson, J., Woodruff, D.P.: An optimal algorithm for the distinct elements problem. In: PODS 2010: Proceedings of the Twenty-ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems of Data, pp. 41–52. ACM, New York (2010)
Li, P.: Compressed counting. In: SODA 2009: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 412–421. Society for Industrial and Applied Mathematics, Philadelphia (2009)
Nelson, J., Woodruff, D.P.: Fast manhattan sketches in data streams. In: PODS 2010: Proceedings of the Twenty-ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems of Data, pp. 99–110. ACM, New York (2010)
Tirthapura, S., Woodruff, D.P.: Rectangle-efficient aggregation in spatial data streams. In: Proceedings of the 31st Symposium on Principles of Database Systems, PODS 2012, pp. 283–294. ACM, New York (2012)
Woodruff, D.P.: Optimal space lower bounds for all frequency moments. In: SODA 2004: Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 167–175 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Braverman, V., Ostrovsky, R. (2013). Generalizing the Layering Method of Indyk and Woodruff: Recursive Sketches for Frequency-Based Vectors on Streams. In: Raghavendra, P., Raskhodnikova, S., Jansen, K., Rolim, J.D.P. (eds) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. APPROX RANDOM 2013 2013. Lecture Notes in Computer Science, vol 8096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40328-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-40328-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40327-9
Online ISBN: 978-3-642-40328-6
eBook Packages: Computer ScienceComputer Science (R0)