Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3321))

Included in the following conference series:

Abstract

This text is an informal review of several randomized algorithms that have appeared over the past two decades and have proved instrumental in extracting efficiently quantitative characteristics of very large data sets. The algorithms are by nature probabilistic and based on hashing. They exploit properties of simple discrete probabilistic models and their design is tightly coupled with their analysis, itself often founded on methods from analytic combinatorics. Singularly efficient solutions have been found that defy information theoretic lower bounds applicable to deterministic algorithms. Characteristics like the total number of elements, cardinality (the number of distinct elements), frequency moments, as well as unbiased samples can be gathered with little loss of information and only a small probability of failure. The algorithms are applicable to traffic monitoring in networks, to data base query optimization, and to some of the basic tasks of data mining. They apply to massive data streams and in many cases require strictly minimal auxiliary storage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58(1), 137–147 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  2. Andrews, G.E., Crippa, D., Simon, K.: q-series arising from the study of random graphs. SIAM Journal on Discrete Mathematics 10(1), 41–56 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  3. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of Symposium on Principles of Database Systems (PODS), pp. 1–16 (2002)

    Google Scholar 

  4. Bertoin, J., Biane, P., Yor, M.: Poissonian exponential functionals, q-series, q-integrals, and the moment problem for log-normal distributions. Tech. Rep. PMA-705, Laboratoire de Probabilitś et Modèles Aléatoires, Université Paris VI (2002)

    Google Scholar 

  5. Durand, M.: Combinatoire analytique et algorithmique des ensembles de données. PhD thesis, École Polytechnique, France (2004)

    Google Scholar 

  6. Durand, M., Flajolet, P.: Loglog counting of large cardinalities. In: Di Battista, G., Zwick, U. (eds.) ESA 2003. LNCS, vol. 2832, pp. 605–617. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  7. Estan, C., Varghese, G.: New directions in traffic measurement and accounting. In: Proceedings of SIGCOMM 2002. ACM Press, New York (2002); Also: UCSD technical report CS2002-0699, February, available electronically (2002)

    Google Scholar 

  8. Estan, C., Varghese, G.: New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice. ACM Transactions on Computer Systems 21(3), 270–313 (2003)

    Article  Google Scholar 

  9. Estan, C., Varghese, G., Fisk, M.: Bitmap algorithms for counting active flows on high speed links. In: Technical Report CS2003-0738, UCSD (March 2003); Available electronically. Summary in ACM SIGCOMM Computer Communication Review 32(3), 10 (July 2002)

    Google Scholar 

  10. Finch, S.: Mathematical Constants. Cambridge University Press, New-York (2003)

    Book  MATH  Google Scholar 

  11. Flajolet, P.: Approximate counting: A detailed analysis. BIT 25, 113–134 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  12. Flajolet, P.: On adaptive sampling. Computing 34, 391–400 (1990)

    Article  MathSciNet  Google Scholar 

  13. Flajolet, P., Gourdon, X., Dumas, P.: Mellin transforms and asymptotics: Harmonic sums. Theoretical Computer Science 144(1-2), 1–2 (1995)

    Article  MathSciNet  Google Scholar 

  14. Flajolet, P., Martin, G.N.: Probabilistic counting. In: Proceedings of the 24th Annual Symposium on Foundations of Computer Science, pp. 76–82. IEEE Computer Society Press, Los Alamitos (1983)

    Google Scholar 

  15. Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences 31(2), 182–209 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  16. Flajolet, P., Sedgewick, R.: Analytic Combinatorics (2004); Book in preparation; Individual chapters are available electronically

    Google Scholar 

  17. Guillemin, F., Robert, P., Zwart, B.: AIMD algorithms and exponential functionals. Annals of Applied Probability 14(1), 90–117 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  18. Hofri, M.: Analysis of Algorithms: Computational Methods and Mathematical Tools. Oxford University Press, Oxford (1995)

    MATH  Google Scholar 

  19. Indyk, P.: Stable distributions, pseudorandom generators, embeddings and data stream computation. In: Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 189–197 (2000)

    Google Scholar 

  20. Jacquet, P., Szpankowski, W.: Analytical de-Poissonization and its applications. Theoretical Computer Science 201(1-2), 1–62 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  21. Knuth, D.E.: The Art of Computer Programming, 3rd edn. Seminumerical Algorithms, vol. 2. Addison-Wesley, Reading (1998)

    Google Scholar 

  22. Knuth, D.E.: The Art of Computer Programming, 2nd edn. Sorting and Searching, vol. 3. Addison-Wesley, Reading (1998)

    Google Scholar 

  23. Morris, R.: Counting large numbers of events in small registers. Communications of the ACM 21(10), 840–842 (1977)

    Article  Google Scholar 

  24. Prodinger, H.: Approximate counting via Euler transform. Mathematica Slovaka 44, 569–574 (1994)

    MATH  MathSciNet  Google Scholar 

  25. Sedgewick, R., Flajolet, P.: An Introduction to the Analysis of Algorithms. Addison-Wesley Publishing Company, Reading (1996)

    MATH  Google Scholar 

  26. Szpankowski, W.: Average-Case Analysis of Algorithms on Sequences. John Wiley, New York (2001)

    MATH  Google Scholar 

  27. Vitter, J.: Random sampling with a reservoir. ACM Transactions on Mathematical Software 11(1) (1985)

    Google Scholar 

  28. Whang, K.-Y., Vander-Zanden, B., Taylor, H.: A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems 15(2), 208–229 (1990)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Flajolet, P. (2004). Counting by Coin Tossings. In: Maher, M.J. (eds) Advances in Computer Science - ASIAN 2004. Higher-Level Decision Making. ASIAN 2004. Lecture Notes in Computer Science, vol 3321. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30502-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30502-6_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24087-7

  • Online ISBN: 978-3-540-30502-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics