Abstract
We show an Ω((n 1 − 2/p logM)/ε 2) bits of space lower bound for (1 + ε)-approximating the p-th frequency moment \(F_p = \|x\|_p^p = \sum_{i=1}^n |x_i|^p\) of a vector x ∈ { − M, − M + 1, …, M}n with constant probability in the turnstile model for data streams, for any p > 2 and ε ≥ 1/n 1/p (we require ε ≥ 1/n 1/p since there is a trivial O(n logM) upper bound). This lower bound matches the space complexity of an upper bound of Ganguly for any ε < 1/logO(1) n, and is the first of any bound in the long sequence of work on estimating F p to be shown to be optimal up to a constant factor for any setting of parameters. Moreover, our technique improves the dependence on ε in known lower bounds for cascaded moments, also known as mixed norms. We also continue the study of tight bounds on the dimension of linear sketches (drawn from some distribution) required for estimating F p over the reals. We show a dimension lower bound of Ω(n 1 − 2/p/ε 2) for sketches providing a (1 + ε)-approximation to \(\|x\|_p^p\) with constant probability, for any p > 2 and ε ≥ 1/n 1/p. This is again optimal for ε < 1/logO(1) n.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Indyk, P.: Sketching, streaming and sublinear-space algorithms (2007), Graduate course notes available at http://stellar.mit.edu/S/course/6/fa07/6.895/
Muthukrishnan, S.: Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science 1(2), 117–236 (2005)
Alon, N., Matias, Y., Szegedy, M.: The Space Complexity of Approximating the Frequency Moments. JCSS 58(1), 137–147 (1999)
Flajolet, P., Martin, G.N.: Probabilistic counting. In: Proceedings of the 24th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 76–82 (1983)
Chakrabarti, A., Do Ba, K., Muthukrishnan, S.: Estimating Entropy and Entropy Norm on Data Streams. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, pp. 196–205. Springer, Heidelberg (2006)
Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
Indyk, P., Woodruff, D.P.: Optimal approximations of the frequency moments of data streams. In: STOC, pp. 202–208 (2005)
Bhuvanagiri, L., Ganguly, S., Kesh, D., Saha, C.: Simpler algorithm for estimating frequency moments of data streams. In: SODA, pp. 708–713 (2006)
Monemizadeh, M., Woodruff, D.P.: 1-pass relative-error ℓ p -sampling with applications. In: Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, pp. 1143–1160 (2010)
Andoni, A., Krauthgamer, R., Onak, K.: Streaming algorithms via precision sampling. In: FOCS, pp. 363–372 (2011)
Braverman, V., Ostrovsky, R.: Recursive sketching for frequency moments. CoRR abs/1011.2571 (2010)
Andoni, A.: High frequency moment via max stability, http://web.mit.edu/andoni/www/papers/fkStable.pdf
Ganguly, S.: Polynomial estimators for high frequency moments. CoRR abs/1104.4552 (2011)
Woodruff, D.P.: Optimal space lower bounds for all frequency moments. In: Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 167–175 (2004)
Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D.: An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci. 68(4), 702–732 (2004)
Chakrabarti, A., Khot, S., Sun, X.: Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In: CCC, pp. 107–117 (2003)
Woodruff, D.P., Zhang, Q.: Tight bounds for distributed functional monitoring. In: Proceedings of the 44th Symposium on Theory of Computing, STOC 2012, pp. 941–960 (2012)
Ganguly, S.: A lower bound for estimating high moments of a data stream. CoRR abs/1201.0253 (2012)
Braverman, V., Ostrovsky, R.: Approximating large frequency moments with pick-and-drop sampling. CoRR abs/1212.0202 (2012)
Coppersmith, D., Kumar, R.: An improved data stream algorithm for frequency moments. In: Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 151–156 (2004)
Ganguly, S.: Estimating frequency moments of data streams using random linear combinations. In: Jansen, K., Khanna, S., Rolim, J.D.P., Ron, D. (eds.) APPROX and RANDOM 2004. LNCS, vol. 3122, pp. 369–380. Springer, Heidelberg (2004)
Ganguly, S.: A hybrid algorithm for estimating frequency moments of data streams. Manuscript (2004)
Andoni, A., Nguyễn, H.L., Polyanskiy, Y., Wu, Y.: Tight lower bound for linear sketches of moments. In: Fomin, F.V., Freivalds, R., Kwiatkowska, M., Peleg, D. (eds.) ICALP 2013, Part I. LNCS, vol. 7965, pp. 25–32. Springer, Heidelberg (2013)
Price, E., Woodruff, D.P.: Applications of the shannon-hartley theorem to data streams and sparse recovery. In: ISIT, pp. 2446–2450 (2012)
Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L.: Counting distinct elements in a data stream. In: Rolim, J.D.P., Vadhan, S.P. (eds.) RANDOM 2002. LNCS, vol. 2483, pp. 1–10. Springer, Heidelberg (2002)
Chakrabarti, A., Cormode, G., McGregor, A.: A near-optimal algorithm for estimating the entropy of a stream. ACM Transactions on Algorithms 6(3) (2010)
Clarkson, K.L., Woodruff, D.P.: Numerical linear algebra in the streaming model. In: STOC, pp. 205–214 (2009)
Ganguly, S.: Lower bounds on frequency estimation of data streams (extended abstract). In: CSR, pp. 204–215 (2008)
Ganguly, S.: Deterministically estimating data stream frequencies. In: Du, D.-Z., Hu, X., Pardalos, P.M. (eds.) COCOA 2009. LNCS, vol. 5573, pp. 301–312. Springer, Heidelberg (2009)
Ganguly, S., Cormode, G.: On estimating frequency moments of data streams. In: Charikar, M., Jansen, K., Reingold, O., Rolim, J.D.P. (eds.) APPROX and RANDOM 2007. LNCS, vol. 4627, pp. 479–493. Springer, Heidelberg (2007)
Indyk, P., Woodruff, D.P.: Tight lower bounds for the distinct elements problem. In: FOCS, pp. 283–288 (2003)
Kane, D.M., Nelson, J., Porat, E., Woodruff, D.P.: Fast moment estimation in data streams in optimal space. In: STOC, pp. 745–754 (2011)
Pavan, A., Tirthapura, S.: Range-efficient counting of distinct elements in a massive data stream. SIAM J. Comput. 37(2), 359–379 (2007)
Chakrabarti, A., Shi, Y., Wirth, A., Yao, A.C.C.: Informational complexity and the direct sum problem for simultaneous message complexity. In: FOCS, pp. 270–278 (2001)
Cormode, G., Muthukrishnan, S.: Space efficient mining of multigraph streams. In: PODS, pp. 271–282 (2005)
Jayram, T.S., Woodruff, D.P.: The data stream space complexity of cascaded norms. In: FOCS, pp. 765–774 (2009)
Ledoux, M., Talagrand, M.: Probability in Banach Spaces: Isoperimetry and Processes. Springer (1991)
DasGupta, A.: Asymptotic Theory of Statistics and Probability. Springer (2008)
Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press (1997)
Bar-Yossef, Z.: The Complexity of Massive Data Set Computations. PhD thesis, University of California, Berkeley (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, Y., Woodruff, D.P. (2013). A Tight Lower Bound for High Frequency Moment Estimation with Small Error. In: Raghavendra, P., Raskhodnikova, S., Jansen, K., Rolim, J.D.P. (eds) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. APPROX RANDOM 2013 2013. Lecture Notes in Computer Science, vol 8096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40328-6_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-40328-6_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40327-9
Online ISBN: 978-3-642-40328-6
eBook Packages: Computer ScienceComputer Science (R0)