Abstract
Finding top-k elephant flows in high-speed networks is one of the most fundamental network measurement tasks. It is more challenging than per-flow size estimation since the IDs and sizes of top-k flows must be tracked simultaneously. Most existing studies only record the IDs of a small number of elephant flows to fit their estimators in the extremely limited high-speed on-chip memory. However, these solutions need too many memory accesses when a packet arrives to track the elephant flows with high accuracy, which limits their practicability. Therefore, this paper proposes Jigsaw-Sketch, a new algorithm to find the top-k elephant flows with much fewer memory accesses while achieving high memory efficiency and accuracy. In this design, we propose a novel two-stage jigsaw storage scheme, which can capture the candidate top-k flows from massive network steams efficiently, and further find the top-k elephant flows with high memory efficiency and only a few memory accesses for each packet. Extensive experimental results based on real network traces show that Jigsaw-Sketch improves the packet processing throughput by at least 86%, while achieving smaller memory footprints and higher accuracy compared to the SOTA.
Access this article
Rent this article via DeepDyve
References
Sivaraman A, Subramanian S, Alizadeh M, et al. Programmable packet scheduling at line rate. In: Proceedings of ACM SIGCOMM Conference, Florianopolis, 2016. 44–57
Huang H, Sun Y E, Chen S, et al. You can drop but you can’t hide: k-persistent spread estimation in high-speed networks. In: Proceedings of IEEE Conference on Computer Communications (INFOCOM), Honolulu, 2018. 1889–1897
Rottenstreich O, Tapolcai J. Optimal rule caching and lossy compression for longest prefix matching. IEEE ACM Trans Netw, 2016, 25: 864–878
Yu M, Jose L, Miao R. Software defined traffic measurement with OpenSketch. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI), Lombard, 2013. 29–42
Sun Y E, Huang H, Ma C, et al. Online spread estimation with non-duplicate sampling. In: Proceedings of IEEE Conference on Computer Communications (INFOCOM), Toronto, 2020. 2440–2448
Zhou Y, Zhou Y, Chen S, et al. Highly compact virtual active counters for per-flow traffic measurement. In: Proceedings of IEEE Conference on Computer Communications (INFOCOM), Honolulu, 2018. 1–9
Zhao Y, Yang K, Liu Z, et al. LightGuardian: a full-visibility, lightweight, in-band telemetry system using sketchlets. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2021. 991–1010
Du Y, Huang H, Sun Y E, et al. Self-adaptive sampling for network traffic measurement. In: Proceedings of IEEE Conference on Computer Communications (INFOCOM), Vancouver, 2021. 1–10
Yang T, Zhang H, Li J, et al. Heavy Keeper: an accurate algorithm for finding top-k elephant flows. IEEE ACM Trans Netw, 2019, 27: 1845–1858
Ilyas I F, Beskales G, Soliman M A. A survey of top-k query processing techniques in relational database systems. ACM Comput Surv, 2008, 40: 1–58
Soliman M A, Ilyas I F, Chang K C C. Probabilistic top-k and ranking-aggregate queries. ACM Trans Database Syst, 2008, 33: 1–54
Cheung Y L, Fu A W C. Mining frequent itemsets without support threshold: with and without item constraints. IEEE Trans Knowl Data Eng, 2004, 16: 1052–1069
Alsaudi A, Altowim Y, Mehrotra S, et al. TQEL: framework for query-driven linking of top-k entities in social media blogs. Proc VLDB Endow, 2021, 14: 2642–2654
Lakhina A, Crovella M, Diot C. Characterization of network-wide anomalies in traffic flows. In: Proceedings of the 4th ACM SIGCOMM Conference on Internet measurement (IMC), Taormina Sicily, 2004. 201–206
Zhang Y, Fang B X, Zhang Y Z. Identifying heavy hitters in high-speed network monitoring. Sci China Inf Sci, 2010, 53: 659–676
Cormode G, Muthukrishnan S. An improved data stream summary: the count-min sketch and its applications. J Algorithms, 2005, 55: 58–75
Lu Y, Montanari A, Prabhakar B, et al. Counter braids: a novel counter architecture for per-flow measurement. SIGMETRICS Perform Eval Rev, 2008, 36: 121–132
Chen M, Chen S, Cai Z. Counter tree: a scalable counter architecture for per-flow traffic measurement. IEEE ACM Trans Netw, 2016, 25: 1249–1262
Li H, Chen Q, Zhang Y, et al. Stingy sketch: a sketch framework for accurate and fast frequency estimation. Proc VLDB Endow, 2022, 15: 1426–1438
Yang T, Xu J, Liu X, et al. A generic technique for sketches to adapt to different counting ranges. In: Proceedings of IEEE Conference on Computer Communications (INFOCOM), Paris, 2019. 2017–2025
Gong J, Yang T, Zhang H, et al. HeavyKeeper: an accurate algorithm for finding top-k elephant flows. In: Proceedings of USENIX Annual Technical Conference (ATC), Boston, 2018. 909–921
Yang T, Gong J, Zhang H, et al. HeavyGuardian: separate and guard hot items in data streams. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, 2018. 2584–2593
Yang T, Jiang J, Liu P, et al. Elastic sketch: adaptive and fast network-wide measurements. In: Proceedings of the ACM SIGCOMM Conference, Budapest, 2018. 561–575
Li J, Li Z, Xu Y, et al. Wavingsketch: an unbiased and generic sketch for finding top-k items in data streams. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2020. 1574–1584
Metwally A, Agrawal D, Abbadi A E. Efficient computation of frequent and top-k elements in data streams. In: Proceedings of International Conference on Database Theory, Edinburgh, 2005. 398–412
Ben-Basat R, Einziger G, Friedman R, et al. Heavy hitters in streams and sliding windows. In: Proceedings of IEEE Conference on Computer Communications (INFOCOM), San Francisco, 2016. 1–9
Ting D. Data sketches for disaggregated subset sum and frequent item estimation. In: Proceedings of International Conference on Management of Data (SIGMOD), Houston, 2018. 1129–1140
Homem N, Carvalho J P. Finding top-k elements in data streams. Inf Sci, 2010, 180: 4958–4974
Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. In: Proceedings of International Colloquium on Automata, Languages, and Programming, Malaga, 2002. 693–703
Einziger G, Friedman R. Counting with tinytable: every bit counts! In: Proceedings of International Conference on Distributed Computing and Networking (ICDCN), Singapore, 2016. 1–10
Yu X, Xu H, Yao D, et al. CountMax: a lightweight and cooperative sketch measurement for software-defined networks. IEEE ACM Trans Netw, 2018, 26: 2774–2786
Tang L, Huang Q, Lee P P C. MV-Sketch: a fast and compact invertible sketch for heavy flow detection in network data streams. In: Proceedings of IEEE Conference on Computer Communications (INFOCOM), Paris, 2019. 2026–2034
Zhang Y, Liu Z, Wang R, et al. CocoSketch: high-performance sketch-based measurement over arbitrary partial key query. In: Proceedings of the ACM SIGCOMM Conference, 2021. 207–222
Ye J, Li L, Zhang W, et al. UA-Sketch: an accurate approach to detect heavy flow based on uninterrupted arrival. In: Proceedings of the 51st International Conference on Parallel Processing, Bordeaux, 2022. 1–11
Huang J, Zhang W, Li Y, et al. ChainSketch: an efficient and accurate sketch for heavy flow detection. IEEE ACM Trans Netw, 2022, 31: 738–753
Acknowledgements
This work was supported in part by National Natural Science Foundation of China (Grant Nos. 62332013, 62072322, U20A20182, 62202322), Natural Science Foundation of Jiangsu Province (Grant No. BK20210706), and Jiangsu Planned Projects for Postdoctoral Research Funds (Grant No. 2021K165B).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, B., Huang, H., Sun, YE. et al. Jigsaw-Sketch: a fast and accurate algorithm for finding top-k elephant flows in high-speed networks. Sci. China Inf. Sci. 67, 142101 (2024). https://doi.org/10.1007/s11432-022-3794-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-022-3794-1