Skip to main content

A Tight Lower Bound for High Frequency Moment Estimation with Small Error

  • Conference paper
Book cover Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX 2013, RANDOM 2013)

Abstract

We show an Ω((n 1 − 2/p logM)/ε 2) bits of space lower bound for (1 + ε)-approximating the p-th frequency moment \(F_p = \|x\|_p^p = \sum_{i=1}^n |x_i|^p\) of a vector x ∈ { − M, − M + 1, …, M}n with constant probability in the turnstile model for data streams, for any p > 2 and ε ≥ 1/n 1/p (we require ε ≥ 1/n 1/p since there is a trivial O(n logM) upper bound). This lower bound matches the space complexity of an upper bound of Ganguly for any ε < 1/logO(1) n, and is the first of any bound in the long sequence of work on estimating F p to be shown to be optimal up to a constant factor for any setting of parameters. Moreover, our technique improves the dependence on ε in known lower bounds for cascaded moments, also known as mixed norms. We also continue the study of tight bounds on the dimension of linear sketches (drawn from some distribution) required for estimating F p over the reals. We show a dimension lower bound of Ω(n 1 − 2/p/ε 2) for sketches providing a (1 + ε)-approximation to \(\|x\|_p^p\) with constant probability, for any p > 2 and ε ≥ 1/n 1/p. This is again optimal for ε < 1/logO(1) n.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Indyk, P.: Sketching, streaming and sublinear-space algorithms (2007), Graduate course notes available at http://stellar.mit.edu/S/course/6/fa07/6.895/

  2. Muthukrishnan, S.: Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science 1(2), 117–236 (2005)

    Article  MathSciNet  Google Scholar 

  3. Alon, N., Matias, Y., Szegedy, M.: The Space Complexity of Approximating the Frequency Moments. JCSS 58(1), 137–147 (1999)

    MathSciNet  MATH  Google Scholar 

  4. Flajolet, P., Martin, G.N.: Probabilistic counting. In: Proceedings of the 24th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 76–82 (1983)

    Google Scholar 

  5. Chakrabarti, A., Do Ba, K., Muthukrishnan, S.: Estimating Entropy and Entropy Norm on Data Streams. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, pp. 196–205. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  7. Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  8. Indyk, P., Woodruff, D.P.: Optimal approximations of the frequency moments of data streams. In: STOC, pp. 202–208 (2005)

    Google Scholar 

  9. Bhuvanagiri, L., Ganguly, S., Kesh, D., Saha, C.: Simpler algorithm for estimating frequency moments of data streams. In: SODA, pp. 708–713 (2006)

    Google Scholar 

  10. Monemizadeh, M., Woodruff, D.P.: 1-pass relative-error ℓ p -sampling with applications. In: Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, pp. 1143–1160 (2010)

    Google Scholar 

  11. Andoni, A., Krauthgamer, R., Onak, K.: Streaming algorithms via precision sampling. In: FOCS, pp. 363–372 (2011)

    Google Scholar 

  12. Braverman, V., Ostrovsky, R.: Recursive sketching for frequency moments. CoRR abs/1011.2571 (2010)

    Google Scholar 

  13. Andoni, A.: High frequency moment via max stability, http://web.mit.edu/andoni/www/papers/fkStable.pdf

  14. Ganguly, S.: Polynomial estimators for high frequency moments. CoRR abs/1104.4552 (2011)

    Google Scholar 

  15. Woodruff, D.P.: Optimal space lower bounds for all frequency moments. In: Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 167–175 (2004)

    Google Scholar 

  16. Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D.: An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci. 68(4), 702–732 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  17. Chakrabarti, A., Khot, S., Sun, X.: Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In: CCC, pp. 107–117 (2003)

    Google Scholar 

  18. Woodruff, D.P., Zhang, Q.: Tight bounds for distributed functional monitoring. In: Proceedings of the 44th Symposium on Theory of Computing, STOC 2012, pp. 941–960 (2012)

    Google Scholar 

  19. Ganguly, S.: A lower bound for estimating high moments of a data stream. CoRR abs/1201.0253 (2012)

    Google Scholar 

  20. Braverman, V., Ostrovsky, R.: Approximating large frequency moments with pick-and-drop sampling. CoRR abs/1212.0202 (2012)

    Google Scholar 

  21. Coppersmith, D., Kumar, R.: An improved data stream algorithm for frequency moments. In: Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 151–156 (2004)

    Google Scholar 

  22. Ganguly, S.: Estimating frequency moments of data streams using random linear combinations. In: Jansen, K., Khanna, S., Rolim, J.D.P., Ron, D. (eds.) APPROX and RANDOM 2004. LNCS, vol. 3122, pp. 369–380. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  23. Ganguly, S.: A hybrid algorithm for estimating frequency moments of data streams. Manuscript (2004)

    Google Scholar 

  24. Andoni, A., Nguyễn, H.L., Polyanskiy, Y., Wu, Y.: Tight lower bound for linear sketches of moments. In: Fomin, F.V., Freivalds, R., Kwiatkowska, M., Peleg, D. (eds.) ICALP 2013, Part I. LNCS, vol. 7965, pp. 25–32. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  25. Price, E., Woodruff, D.P.: Applications of the shannon-hartley theorem to data streams and sparse recovery. In: ISIT, pp. 2446–2450 (2012)

    Google Scholar 

  26. Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L.: Counting distinct elements in a data stream. In: Rolim, J.D.P., Vadhan, S.P. (eds.) RANDOM 2002. LNCS, vol. 2483, pp. 1–10. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  27. Chakrabarti, A., Cormode, G., McGregor, A.: A near-optimal algorithm for estimating the entropy of a stream. ACM Transactions on Algorithms 6(3) (2010)

    Google Scholar 

  28. Clarkson, K.L., Woodruff, D.P.: Numerical linear algebra in the streaming model. In: STOC, pp. 205–214 (2009)

    Google Scholar 

  29. Ganguly, S.: Lower bounds on frequency estimation of data streams (extended abstract). In: CSR, pp. 204–215 (2008)

    Google Scholar 

  30. Ganguly, S.: Deterministically estimating data stream frequencies. In: Du, D.-Z., Hu, X., Pardalos, P.M. (eds.) COCOA 2009. LNCS, vol. 5573, pp. 301–312. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  31. Ganguly, S., Cormode, G.: On estimating frequency moments of data streams. In: Charikar, M., Jansen, K., Reingold, O., Rolim, J.D.P. (eds.) APPROX and RANDOM 2007. LNCS, vol. 4627, pp. 479–493. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  32. Indyk, P., Woodruff, D.P.: Tight lower bounds for the distinct elements problem. In: FOCS, pp. 283–288 (2003)

    Google Scholar 

  33. Kane, D.M., Nelson, J., Porat, E., Woodruff, D.P.: Fast moment estimation in data streams in optimal space. In: STOC, pp. 745–754 (2011)

    Google Scholar 

  34. Pavan, A., Tirthapura, S.: Range-efficient counting of distinct elements in a massive data stream. SIAM J. Comput. 37(2), 359–379 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  35. Chakrabarti, A., Shi, Y., Wirth, A., Yao, A.C.C.: Informational complexity and the direct sum problem for simultaneous message complexity. In: FOCS, pp. 270–278 (2001)

    Google Scholar 

  36. Cormode, G., Muthukrishnan, S.: Space efficient mining of multigraph streams. In: PODS, pp. 271–282 (2005)

    Google Scholar 

  37. Jayram, T.S., Woodruff, D.P.: The data stream space complexity of cascaded norms. In: FOCS, pp. 765–774 (2009)

    Google Scholar 

  38. Ledoux, M., Talagrand, M.: Probability in Banach Spaces: Isoperimetry and Processes. Springer (1991)

    Google Scholar 

  39. DasGupta, A.: Asymptotic Theory of Statistics and Probability. Springer (2008)

    Google Scholar 

  40. Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press (1997)

    Google Scholar 

  41. Bar-Yossef, Z.: The Complexity of Massive Data Set Computations. PhD thesis, University of California, Berkeley (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, Y., Woodruff, D.P. (2013). A Tight Lower Bound for High Frequency Moment Estimation with Small Error. In: Raghavendra, P., Raskhodnikova, S., Jansen, K., Rolim, J.D.P. (eds) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. APPROX RANDOM 2013 2013. Lecture Notes in Computer Science, vol 8096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40328-6_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40328-6_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40327-9

  • Online ISBN: 978-3-642-40328-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics