Finding Heavy Hitters from Lossy or Noisy Data

Batman, Lucia; Impagliazzo, Russell; Murray, Cody; Paturi, Ramamohan

doi:10.1007/978-3-642-40328-6_25

Finding Heavy Hitters from Lossy or Noisy Data

Lucia Batman²⁰,
Russell Impagliazzo²⁰,
Cody Murray²⁰ &
…
Ramamohan Paturi²⁰

Conference paper

1826 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8096))

Abstract

Motivated by Dvir et al. and Wigderson and Yehudayoff [3,10], we examine the question of discovering the set of heavy hitters of a distribution on strings (i.e., the set of strings with a certain minimum probability) from lossy or noisy samples. While the previous work concentrated on finding both the set of most probable elements and their probabilities, we consider enumeration, the problem of just finding a list that includes all the most probable elements without associated probabilities. Unlike Wigderson and Yehudayoff [10], we do not assume the underlying distribution has small support size, and our time bounds are independent of the support size. For the enumeration problem, we give a polynomial time algorithm for the lossy sample model for any constant erasure probability μ < 1 , and a quasi-polynomial algorithm for the noisy sample model for any noise probability ν < 1/2 of flipping bits. We extend the lower bound for the number of samples required for the reconstruction problem from [3] to the enumeration problem to show that when μ = 1 − o(1), no polynomial time algorithm exists.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Belkin, M., Sinha, K.: Polynomial learning of distribution families. In: FOCS 2010, pp. 103–112 (2010)
Google Scholar
Dasgupta, S.: Learning mixtures of gaussians. In: FOCS 1999, p. 634. Computer Society (1999)
Google Scholar
Dvir, Z., Rao, A., Wigderson, A., Yehudayoff, A.: Restriction access. In: Innovations in Computer Science 2012, pp. 19–33 (2012)
Google Scholar
Goldreich, O., Levin, L.A.: A hard-core predicate for all one-way functions. In: STOC 1989, pp. 25–32 (1989)
Google Scholar
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28(1), 51–55 (2003)
Article Google Scholar
Kearns, M., Mansour, Y., Ron, D., Rubinfeld, R., Schapire, R.E., Sellie, L.: On the learnability of discrete distributions. In: STOC 1994, pp. 273–282 (1994)
Google Scholar
Moitra, A., Saks, M.: A polynomial time algorithm for lossy population recovery. Manuscript (2013)
Google Scholar
Moitra, A., Valiant, G.: Settling the polynomial learnability of mixtures of gaussians. In: FOCS 2010, pp. 93–102 (2010)
Google Scholar
Arora, S., Kannan, R.: Learning mixtures of arbitrary gaussians. In: STOC 2001, pp. 247–257 (2001)
Google Scholar
Wigderson, A., Yehudayoff, A.: Population recovery and partial identification. In: FOCS 2012, pp. 390–399 (2012)
Google Scholar
Woeginger, G.J.: When does a dynamic programming formulation guarantee the existence of an fptas? In: SODA 1999, pp. 820–829 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of California, San Diego, USA
Lucia Batman, Russell Impagliazzo, Cody Murray & Ramamohan Paturi

Authors

Lucia Batman
View author publications
You can also search for this author in PubMed Google Scholar
Russell Impagliazzo
View author publications
You can also search for this author in PubMed Google Scholar
Cody Murray
View author publications
You can also search for this author in PubMed Google Scholar
Ramamohan Paturi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of California, Berkeley, CA, USA
Prasad Raghavendra
Dept. of computer Science and Engineering, Pennsylvania State University, University Park, PA, USA
Sofya Raskhodnikova
Institute of Computer Science, University of Kiel, 24118, Kiel, Germany
Klaus Jansen
University of Geneva, Centre Universitaire d’Informatique, 1227, Carouge, Switzerland
José D. P. Rolim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Batman, L., Impagliazzo, R., Murray, C., Paturi, R. (2013). Finding Heavy Hitters from Lossy or Noisy Data. In: Raghavendra, P., Raskhodnikova, S., Jansen, K., Rolim, J.D.P. (eds) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. APPROX RANDOM 2013 2013. Lecture Notes in Computer Science, vol 8096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40328-6_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-40328-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40327-9
Online ISBN: 978-3-642-40328-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics