skip to main content
10.1145/2452376.2452456acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm

Published: 18 March 2013 Publication History

Abstract

Cardinality estimation has a wide range of applications and is of particular importance in database systems. Various algorithms have been proposed in the past, and the HyperLogLog algorithm is one of them. In this paper, we present a series of improvements to this algorithm that reduce its memory requirements and significantly increase its accuracy for an important range of cardinalities. We have implemented our proposed algorithm for a system at Google and evaluated it empirically, comparing it to the original HyperLogLog algorithm. Like HyperLogLog, our improved algorithm parallelizes perfectly and computes the cardinality estimate in a single pass.

References

[1]
K. Aouiche and D. Lemire. A comparison of five probabilistic view-size estimation techniques in OLAP. In Workshop on Data Warehousing and OLAP (DOLAP), pages 17--24, 2007.
[2]
Z. Bar-Yossef, T. S. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan. Counting distinct elements in a data stream. In Workshop on Randomization and Approximation Techniques (RANDOM), pages 1--10, London, UK, UK, 2002. Springer-Verlag.
[3]
P. Clifford and I. A. Cosma. A statistical analysis of probabilistic counting algorithms. Scandinavian Journal of Statistics, pages 1--14, 2011.
[4]
M. Durand and P. Flajolet. Loglog counting of large cardinalities. In G. D. Battista and U. Zwick, editors, European Symposium on Algorithms (ESA), volume 2832, pages 605--617, 2003.
[5]
C. Estan, G. Varghese, and M. Fisk. Bitmap algorithms for counting active flows on high-speed links. IEEE/ACM Transactions on Networking, pages 925--937, 2006.
[6]
P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31(2):182--209, 1985.
[7]
P. Flajolet, Éric Fusy, O. Gandouet, and F. Meunier. Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm. In Analysis of Algorithms (AOFA), pages 127--146, 2007.
[8]
F. Giroire. Order statistics and estimating cardinalities of massive data sets. Discrete Applied Mathematics, 157(2):406--427, 2009.
[9]
A. Hall, O. Bachmann, R. Büssow, S. Gănceanu, and M. Nunkesser. Processing a trillion cells per mouse click. In Very Large Databases (VLDB), 2012.
[10]
P. Indyk. Tight lower bounds for the distinct elements problem. In Foundations of Computer Science (FOCS), pages 283--288, 2003.
[11]
D. M. Kane, J. Nelson, and D. P. Woodruff. An optimal algorithm for the distinct elements problem. In Principles of database systems (PODS), pages 41--52. ACM, 2010.
[12]
J. Lumbroso. An optimal cardinality estimation algorithm based on order statistics and its full analysis. In Analysis of Algorithms (AOFA), pages 489--504, 2010.
[13]
S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, T. Vassilakis, and G. Inc. Dremel: Interactive analysis of web-scale datasets. In Very Large Databases (VLDB), pages 330--339, 2010.
[14]
A. Metwally, D. Agrawal, and A. E. Abbadi. Why go logarithmic if we can go linear? Towards effective distinct counting of search traffic. In Extending database technology (EDBT), pages 618--629, 2008.
[15]
R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpreting the data, parallel analysis with Sawzall. Journal on Scientific Programming, pages 277--298, 2005.
[16]
K.-Y. Whang, B. T. Vander-Zanden, and H. M. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems, 15:208--229, 1990.

Cited By

View all
  • (2025)HELP: Everlasting Privacy through Server-Aided RandomnessIACR Communications in Cryptology10.62056/a3w7tr-10k1:4Online publication date: 13-Jan-2025
  • (2025)In Search of a Memory-Efficient Framework for Online Cardinality EstimationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348657137:1(392-407)Online publication date: Jan-2025
  • (2024)Knowledge graphs can be learned with just intersection featuresProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693117(26199-26214)Online publication date: 21-Jul-2024
  • Show More Cited By

Index Terms

  1. HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      EDBT '13: Proceedings of the 16th International Conference on Extending Database Technology
      March 2013
      793 pages
      ISBN:9781450315975
      DOI:10.1145/2452376
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 March 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article

      Conference

      EDBT/ICDT '13

      Acceptance Rates

      Overall Acceptance Rate 7 of 10 submissions, 70%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)231
      • Downloads (Last 6 weeks)21
      Reflects downloads up to 15 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)HELP: Everlasting Privacy through Server-Aided RandomnessIACR Communications in Cryptology10.62056/a3w7tr-10k1:4Online publication date: 13-Jan-2025
      • (2025)In Search of a Memory-Efficient Framework for Online Cardinality EstimationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348657137:1(392-407)Online publication date: Jan-2025
      • (2024)Knowledge graphs can be learned with just intersection featuresProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693117(26199-26214)Online publication date: 21-Jul-2024
      • (2024)Unmasking vulnerabilitiesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692093(550-576)Online publication date: 21-Jul-2024
      • (2024)DIAFM: An Improved and Novel Approach for Incremental Frequent Itemset MiningMathematics10.3390/math1224393012:24(3930)Online publication date: 13-Dec-2024
      • (2024)Learned Query Optimization by Constraint-Based Query Plan AugmentationMathematics10.3390/math1219310212:19(3102)Online publication date: 3-Oct-2024
      • (2024)Partition, Don't Sort! Compression Boosters for Cloud Data Ingestion PipelinesProceedings of the VLDB Endowment10.14778/3681954.368201317:11(3456-3469)Online publication date: 1-Jul-2024
      • (2024)UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct CountingProceedings of the VLDB Endowment10.14778/3654621.365463217:7(1655-1668)Online publication date: 1-Mar-2024
      • (2024)A Universal Sketch for Estimating Heavy Hitters and Per-Element Frequency Moments in Data Streams with Bounded DeletionsProceedings of the ACM on Management of Data10.1145/36987992:6(1-28)Online publication date: 20-Dec-2024
      • (2024)An LDP Compatible Sketch for Securely Approximating Set Intersection CardinalitiesProceedings of the ACM on Management of Data10.1145/36392812:1(1-27)Online publication date: 26-Mar-2024
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media