Skip to main content

Advertisement

Log in

Jigsaw-Sketch: a fast and accurate algorithm for finding top-k elephant flows in high-speed networks

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Finding top-k elephant flows in high-speed networks is one of the most fundamental network measurement tasks. It is more challenging than per-flow size estimation since the IDs and sizes of top-k flows must be tracked simultaneously. Most existing studies only record the IDs of a small number of elephant flows to fit their estimators in the extremely limited high-speed on-chip memory. However, these solutions need too many memory accesses when a packet arrives to track the elephant flows with high accuracy, which limits their practicability. Therefore, this paper proposes Jigsaw-Sketch, a new algorithm to find the top-k elephant flows with much fewer memory accesses while achieving high memory efficiency and accuracy. In this design, we propose a novel two-stage jigsaw storage scheme, which can capture the candidate top-k flows from massive network steams efficiently, and further find the top-k elephant flows with high memory efficiency and only a few memory accesses for each packet. Extensive experimental results based on real network traces show that Jigsaw-Sketch improves the packet processing throughput by at least 86%, while achieving smaller memory footprints and higher accuracy compared to the SOTA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Rent this article via DeepDyve

References

  1. Sivaraman A, Subramanian S, Alizadeh M, et al. Programmable packet scheduling at line rate. In: Proceedings of ACM SIGCOMM Conference, Florianopolis, 2016. 44–57

  2. Huang H, Sun Y E, Chen S, et al. You can drop but you can’t hide: k-persistent spread estimation in high-speed networks. In: Proceedings of IEEE Conference on Computer Communications (INFOCOM), Honolulu, 2018. 1889–1897

  3. Rottenstreich O, Tapolcai J. Optimal rule caching and lossy compression for longest prefix matching. IEEE ACM Trans Netw, 2016, 25: 864–878

    Article  Google Scholar 

  4. Yu M, Jose L, Miao R. Software defined traffic measurement with OpenSketch. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI), Lombard, 2013. 29–42

  5. Sun Y E, Huang H, Ma C, et al. Online spread estimation with non-duplicate sampling. In: Proceedings of IEEE Conference on Computer Communications (INFOCOM), Toronto, 2020. 2440–2448

  6. Zhou Y, Zhou Y, Chen S, et al. Highly compact virtual active counters for per-flow traffic measurement. In: Proceedings of IEEE Conference on Computer Communications (INFOCOM), Honolulu, 2018. 1–9

  7. Zhao Y, Yang K, Liu Z, et al. LightGuardian: a full-visibility, lightweight, in-band telemetry system using sketchlets. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2021. 991–1010

  8. Du Y, Huang H, Sun Y E, et al. Self-adaptive sampling for network traffic measurement. In: Proceedings of IEEE Conference on Computer Communications (INFOCOM), Vancouver, 2021. 1–10

  9. Yang T, Zhang H, Li J, et al. Heavy Keeper: an accurate algorithm for finding top-k elephant flows. IEEE ACM Trans Netw, 2019, 27: 1845–1858

    Article  Google Scholar 

  10. Ilyas I F, Beskales G, Soliman M A. A survey of top-k query processing techniques in relational database systems. ACM Comput Surv, 2008, 40: 1–58

    Article  Google Scholar 

  11. Soliman M A, Ilyas I F, Chang K C C. Probabilistic top-k and ranking-aggregate queries. ACM Trans Database Syst, 2008, 33: 1–54

    Article  Google Scholar 

  12. Cheung Y L, Fu A W C. Mining frequent itemsets without support threshold: with and without item constraints. IEEE Trans Knowl Data Eng, 2004, 16: 1052–1069

    Article  Google Scholar 

  13. Alsaudi A, Altowim Y, Mehrotra S, et al. TQEL: framework for query-driven linking of top-k entities in social media blogs. Proc VLDB Endow, 2021, 14: 2642–2654

    Article  Google Scholar 

  14. Lakhina A, Crovella M, Diot C. Characterization of network-wide anomalies in traffic flows. In: Proceedings of the 4th ACM SIGCOMM Conference on Internet measurement (IMC), Taormina Sicily, 2004. 201–206

  15. Zhang Y, Fang B X, Zhang Y Z. Identifying heavy hitters in high-speed network monitoring. Sci China Inf Sci, 2010, 53: 659–676

    Article  Google Scholar 

  16. Cormode G, Muthukrishnan S. An improved data stream summary: the count-min sketch and its applications. J Algorithms, 2005, 55: 58–75

    Article  MathSciNet  Google Scholar 

  17. Lu Y, Montanari A, Prabhakar B, et al. Counter braids: a novel counter architecture for per-flow measurement. SIGMETRICS Perform Eval Rev, 2008, 36: 121–132

    Article  Google Scholar 

  18. Chen M, Chen S, Cai Z. Counter tree: a scalable counter architecture for per-flow traffic measurement. IEEE ACM Trans Netw, 2016, 25: 1249–1262

    Article  Google Scholar 

  19. Li H, Chen Q, Zhang Y, et al. Stingy sketch: a sketch framework for accurate and fast frequency estimation. Proc VLDB Endow, 2022, 15: 1426–1438

    Article  Google Scholar 

  20. Yang T, Xu J, Liu X, et al. A generic technique for sketches to adapt to different counting ranges. In: Proceedings of IEEE Conference on Computer Communications (INFOCOM), Paris, 2019. 2017–2025

  21. Gong J, Yang T, Zhang H, et al. HeavyKeeper: an accurate algorithm for finding top-k elephant flows. In: Proceedings of USENIX Annual Technical Conference (ATC), Boston, 2018. 909–921

  22. Yang T, Gong J, Zhang H, et al. HeavyGuardian: separate and guard hot items in data streams. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, 2018. 2584–2593

  23. Yang T, Jiang J, Liu P, et al. Elastic sketch: adaptive and fast network-wide measurements. In: Proceedings of the ACM SIGCOMM Conference, Budapest, 2018. 561–575

  24. Li J, Li Z, Xu Y, et al. Wavingsketch: an unbiased and generic sketch for finding top-k items in data streams. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2020. 1574–1584

  25. Metwally A, Agrawal D, Abbadi A E. Efficient computation of frequent and top-k elements in data streams. In: Proceedings of International Conference on Database Theory, Edinburgh, 2005. 398–412

  26. Ben-Basat R, Einziger G, Friedman R, et al. Heavy hitters in streams and sliding windows. In: Proceedings of IEEE Conference on Computer Communications (INFOCOM), San Francisco, 2016. 1–9

  27. Ting D. Data sketches for disaggregated subset sum and frequent item estimation. In: Proceedings of International Conference on Management of Data (SIGMOD), Houston, 2018. 1129–1140

  28. Homem N, Carvalho J P. Finding top-k elements in data streams. Inf Sci, 2010, 180: 4958–4974

    Article  Google Scholar 

  29. Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. In: Proceedings of International Colloquium on Automata, Languages, and Programming, Malaga, 2002. 693–703

  30. Einziger G, Friedman R. Counting with tinytable: every bit counts! In: Proceedings of International Conference on Distributed Computing and Networking (ICDCN), Singapore, 2016. 1–10

  31. Yu X, Xu H, Yao D, et al. CountMax: a lightweight and cooperative sketch measurement for software-defined networks. IEEE ACM Trans Netw, 2018, 26: 2774–2786

    Article  Google Scholar 

  32. Tang L, Huang Q, Lee P P C. MV-Sketch: a fast and compact invertible sketch for heavy flow detection in network data streams. In: Proceedings of IEEE Conference on Computer Communications (INFOCOM), Paris, 2019. 2026–2034

  33. Zhang Y, Liu Z, Wang R, et al. CocoSketch: high-performance sketch-based measurement over arbitrary partial key query. In: Proceedings of the ACM SIGCOMM Conference, 2021. 207–222

  34. Ye J, Li L, Zhang W, et al. UA-Sketch: an accurate approach to detect heavy flow based on uninterrupted arrival. In: Proceedings of the 51st International Conference on Parallel Processing, Bordeaux, 2022. 1–11

  35. Huang J, Zhang W, Li Y, et al. ChainSketch: an efficient and accurate sketch for heavy flow detection. IEEE ACM Trans Netw, 2022, 31: 738–753

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (Grant Nos. 62332013, 62072322, U20A20182, 62202322), Natural Science Foundation of Jiangsu Province (Grant No. BK20210706), and Jiangsu Planned Projects for Postdoctoral Research Funds (Grant No. 2021K165B).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to He Huang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, B., Huang, H., Sun, YE. et al. Jigsaw-Sketch: a fast and accurate algorithm for finding top-k elephant flows in high-speed networks. Sci. China Inf. Sci. 67, 142101 (2024). https://doi.org/10.1007/s11432-022-3794-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-022-3794-1

Keywords

Navigation