skip to main content
10.1145/2745844.2745870acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Hyper-Compact Virtual Estimators for Big Network Data Based on Register Sharing

Published: 15 June 2015 Publication History

Abstract

Cardinality estimation over big network data consisting of numerous flows is a fundamental problem with many practical applications. Traditionally the research on this problem focused on using a small amount of memory to estimate each flow's cardinality from a large range (up to $10^9$). However, although the memory needed for each flow has been greatly compressed, when there is an extremely large number of flows, the overall memory demand can still be very high, exceeding the availability under some important scenarios, such as implementing online measurement modules in network processors using only on-chip cache memory. In this paper, instead of allocating a separated data structure (called estimator) for each flow, we take a different path by viewing all the flows together as a whole: Each flow is allocated with a virtual estimator, and these virtual estimators share a common memory space. We discover that sharing at the register (multi-bit) level is superior than sharing at the bit level. We propose a framework of virtual estimators that allows us to apply the idea of sharing to an array of cardinality estimation solutions, achieving far better memory efficiency than the best existing work. Our experiment shows that the new solution can work in a tight memory space of less than 1 bit per flow or even one tenth of a bit per flow --- a quest that has never been realized before.

References

[1]
CAIDA UCSD anonymized 2013 internet traces on Jan. 17.footnotesize http://www.caida.org/data/passive/passive_2013_dataset.xml.
[2]
Google trends. http://www.google.com/trends/.
[3]
Z. Bar-yossef, T. S. Jayram, R. Kumar, D. Sivakumar, L. Trevisan, and Luca. Counting distinct elements in a data stream. Proc. of RANDOM: Workshop on Randomization and Approximation, 2002.
[4]
K. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis, and R. Gemulla. On synopses for distinct-value estimation under multiset operations. Proc. of ACM SIGMOD, 2007.
[5]
G. Cormode and S. Muthukrishnan. An improved data stream summary: the Count-Min sketch and its applications. Proc. of LATIN, 2004.
[6]
M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L. Zhou, L. Zhang, and P. Barham. Vigilante: End-to-end containment of internet worms. SIGOPS Operating Systems Review, 39(5), October 2005.
[7]
X. Dimitropoulos, P. Hurley, and A. Kind. Probabilistic lossy counting: An efficient algorithm for finding heavy hitters. ACM SIGCOMM Computer Communication Review, 38(1), 2008.
[8]
M. Durand and P. Flajolet. Loglog counting of large cardinalities. ESA: European Symposia on Algorithms, pages 605--617, 2003.
[9]
C. Estan and G. Varghese. New directions in traffic measurement and accounting. Proc. of ACM SIGCOMM, August 2002.
[10]
C. Estan, G. Varghese, and M. Fish. Bitmap algorithms for counting active flows on high-speed links. IEEE/ACM Transactions on Networking (TON), 14(5):925--937, 2006.
[11]
P. Flajolet, E. Fusy, O. Gandouet, and F. Meunier. HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm. Proc. of AOFA: International Conference on Analysis Of Algorithms, 2007.
[12]
P. Flajolet and G. N. Martin. Probabilistic counting algorithms for database applications. J. Comput. Syst. Sci., 31(2), 1985.
[13]
W. D. Gardner. Researchers transmit optical data at 16.4 Tbps. InformationWeek, February 2008.
[14]
S. Heule, M. Nunkesser, and A. Hall. HyperLogLog in practice: Algorithmic engineering of a state-of-the-art cardinality estimation algorithm. Proc. of EDBT, 2013.
[15]
T. Li, S. Chen, and Y. Ling. Fast and compact per-flow traffic measurement through randomized counter sharing. in Proc. of IEEE INFOCOM, 2011.
[16]
T. Li, S. Chen, W. Luo, M. Zhang, and Y. Qiao. Spreader classification based on optimal dynamic bit sharing. IEEE/ACM Transactions on Networking, 21(3):817--830, 2013.
[17]
P. Lieven and B. Scheuermann. High-speed per-flow traffic measurement with probabilistic multiplicity counting. Proc. of IEEE INFOCOM, pages 1--9, 2010.
[18]
Y. Lu, A. Montanari, B. Prabhakar, S. Dharmapurikar, and A. Kabbani. Counter braids: A novel counter architecture for per-flow measurement. Proc. of ACM SIGMETRICS, June 2008.
[19]
Y. Lu and B. Prabhakar. Robust counting via counter braids: An error-resilient network measurement architecture. Proc. of IEEE INFOCOM, April 2009.
[20]
Neustar.biz. How to choose a good hash function: Part 3.footnotesize http://research.neustar.biz/2012/02/02/choosing-a-good-hash-function-part-3.
[21]
N. Ntarmos, P. Triantafillou, and G. Weikum. Counting at large: Efficient cardinality estimation in internet-scale data networks. Proc. of ICDE, pages 40--40, 2006.
[22]
K.-Y. Whang, B. T. Vander-Zanden, and H. M. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems, 15(2):208--229, 1990.
[23]
Q. Xiao, Y. Qiao, M. Zhen, and S. Chen. Estimating the persistent spreads in high-speed networks. Proc. of IEEE ICNP, pages 131--142, 2014.
[24]
Q. Xiao, B. Xiao, and S. Chen. Differential estimation in dynamic RFID systems. In Proc. of INFOCOM (mini-conference), pages 295--299, 2013.
[25]
M. Yoon, T. Li, S. Chen, and J.-K. Peir. Fit a spread estimator in small memory. Proc. of IEEE INFOCOM, 2009.
[26]
Q. Zhao, J. Xu, and A. Kumar. Detection of super sources and destinations in high-speed networks: Algorithms, analysis and evaluation. IEEE JASC, 24(10):1840--1852, 2006.
[27]
C. C. Zou, L. Gao, W. Gong, and D. Towsley. Monitoring and early warning for internet worms. Proc. of the 10th ACM Conference on Computer and Communications Security, 2003.

Cited By

View all
  • (2025)In Search of a Memory-Efficient Framework for Online Cardinality EstimationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348657137:1(392-407)Online publication date: Jan-2025
  • (2024)Enhancing Accuracy for Super Spreader Identification in High-Speed Data StreamsProceedings of the VLDB Endowment10.14778/3681954.368198817:11(3124-3137)Online publication date: 30-Aug-2024
  • (2024)From CountMin to Super kJoin Sketches for Flow Spread EstimationIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.327966511:3(2353-2370)Online publication date: May-2024
  • Show More Cited By

Index Terms

  1. Hyper-Compact Virtual Estimators for Big Network Data Based on Register Sharing

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems
      June 2015
      488 pages
      ISBN:9781450334860
      DOI:10.1145/2745844
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 June 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. big network data
      2. cardinality estimation
      3. network stream monitoring

      Qualifiers

      • Research-article

      Funding Sources

      • National Science Foundation of United States 2
      • National Science Foundation of United States

      Conference

      SIGMETRICS '15
      Sponsor:

      Acceptance Rates

      SIGMETRICS '15 Paper Acceptance Rate 32 of 239 submissions, 13%;
      Overall Acceptance Rate 459 of 2,691 submissions, 17%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)44
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 16 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)In Search of a Memory-Efficient Framework for Online Cardinality EstimationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348657137:1(392-407)Online publication date: Jan-2025
      • (2024)Enhancing Accuracy for Super Spreader Identification in High-Speed Data StreamsProceedings of the VLDB Endowment10.14778/3681954.368198817:11(3124-3137)Online publication date: 30-Aug-2024
      • (2024)From CountMin to Super kJoin Sketches for Flow Spread EstimationIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.327966511:3(2353-2370)Online publication date: May-2024
      • (2024)Compact Estimator for Streaming Triangle CountingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337122836:8(3712-3724)Online publication date: Aug-2024
      • (2024)KTSketch: Finding k-Persistent t-Spread Flows in High-Speed NetworksWeb and Big Data10.1007/978-981-97-7241-4_21(326-342)Online publication date: 28-Aug-2024
      • (2023)Accurate and O(1)-Time Query of Per-Flow Cardinality in High-Speed NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2023.326898031:6(2994-3009)Online publication date: Dec-2023
      • (2023)Randomized Error Removal for Online Spread Estimation in High-Speed NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2022.319796831:2(558-573)Online publication date: Apr-2023
      • (2023)Fast Gumbel-Max Sketch and its ApplicationsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.323785735:9(9350-9363)Online publication date: 1-Sep-2023
      • (2023)Couper: Memory-Efficient Cardinality Estimation under Unbalanced Distribution2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00211(2753-2765)Online publication date: Apr-2023
      • (2023)FastSO: A Fast Weighted Cardinality Estimation Algorithm2023 3rd International Conference on Electronic Information Engineering and Computer (EIECT)10.1109/EIECT60552.2023.10441998(494-499)Online publication date: 17-Nov-2023
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media