Identifying heavy hitters in high-speed network monitoring

Zhang, Yu; Fang, BinXing; Zhang, YongZheng

doi:10.1007/s11432-010-0053-5

Identifying heavy hitters in high-speed network monitoring

Research Papers
Published: 26 March 2010

Volume 53, pages 659–676, (2010)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Yu Zhang¹,
BinXing Fang^1,2 &
YongZheng Zhang²

158 Accesses
9 Citations
Explore all metrics

Abstract

Identifying heavy hitters in a network traffic stream is important for a variety of network applications ranging from traffic engineering to anomaly detection such as detection of denial-of-service attacks. Existing methods generally examine newly arriving items in the stream, perform a small number of operations using a small amount of memory, and still provide guarantees on the identifying accuracy. In high-speed network monitoring, the update speed per item is extremely critical. However, so far as we know, there are no identifying algorithms which can provide constant update time (O(1)) in a weighted data stream. In this paper, we present an algorithm named Weighted Lossy Counting (WLC) which is able to identify heavy hitters in a high-speed weighted data stream with constant update time. WLC employs a novel efficient partially ordered data structure which is able to provide a fast per-item update speed while keeping the memory cost relatively low. We compare WLC with state-of-the-art algorithms for finding heavy hitters in real traffic traces. The experimental results show that WLC performs well in accuracy (recall, precision and average relative error) as other algorithms; moreover it has a much higher update speed at the cost of relatively larger memory space used. A theoretical worst-case memory bound of WLC is also derived in this paper; however, experiments with long real traffic traces show that WLC requires much less space than the theoretical bound in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Sommer R, Feldmann A. NetFlow: Information loss or win? In: Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurement. New York: ACM, 2002. 173–174
Chapter Google Scholar
Fred S, Bonald T, Proutiere A, et al. Statistical bandwidth sharing: A study of congestion at flow level. ACM SIGCOMM Comput Commun Rev, 2001, 31: 111–122
Article Google Scholar
Mori T, Kawahara S, Naito S, et al. On the characteristics of Internet traffic variability: Spikes and elephants. In: Proceedings of the 2004 International Symposium on Applications and the Internet. Tokyo: IEEE, 2004. 99–106
Google Scholar
Papagiannaki K, Taft N, Bhattachayya S, et al. On the feasibility of identifying elephants in Internet backbone traffic. Sprint ATL Technical Report TR01-ATL-110918, 2001
Zhang Y, Breslau L, Paxson V, et al. On the characteristics and origins of internet flow rates. In: Proceedings of the 2002 SIGCOMM Conference. New York: ACM, 2002. 309–322
Google Scholar
Fang W J, Peterson L. Inter-AS traffic patterns and their implications. In: Proceedings of GLOBECOM’99. Rio de Janeireo: IEEE, 1999. 1859–1868
Google Scholar
Mahajan R, Floyd S, Wetherall D. Controlling high-bandwidth flows at the congested router. In: Proceedings of IEEE ICNP’01. Washington: IEEE, 2001. 192
Google Scholar
Feldmann A, Greenberg A, Lund C, et al. Deriving traffic demands for operational IP networks: Methodology and experience. IEEE/ACM Trans Netw (ToN), 2001, 9: 265–280
Article Google Scholar
Estan C, Varghese G. New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice. ACM Trans Comput Syst (ToCS), 2003, 21: 270–313
Article Google Scholar
Wang F Y, Yun X C, Wang X F, et al. Identifying heavy hitters in high-speed network monitoring (in Chinese). J Softw, 2007, 18: 3060–3070
Google Scholar
Wang H B, Pei Y J, Lin Y, et al. A LRU based algorithm for identifying and measuring large flows (in Chinese). J Electron Inf Tech, 2007, 29: 2487–2492
Google Scholar
Pei Y J, Wang H B, Cheng S D, et al. A dual-LRU based algorithm for identifying and measuring large flows (in Chinese). Acta Electron Sin, 2009, 37: 684–691
Google Scholar
Raspall F, Sallent S. Adaptive shared-state sampling. In: Proceedings of the 8th ACM SIGCOMM Workshop on Internet Measurement. New York: ACM, 2008. 271–284
Google Scholar
Kumar A, Sung M, Xu J, et al. Data streaming algorithms for efficient and accurate estimation of flow size distribution. In: Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems. New York: ACM, 2004. 177–188
Google Scholar
Metwally A, Agrawal D, El Abbadi A. An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans Database Syst (TODS), 2006, 31: 1095–1133
Article Google Scholar
Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. Theor Comput Sci, 2004, 312: 3–15
Article MATH MathSciNet Google Scholar
Cormode G, Muthukrishnan S. An improved data stream summary: The count-min sketch and its applications. J Algorithms, 2005, 55: 58–75
Article MATH MathSciNet Google Scholar
Karp R, Shenker S, Papadimitriou C. A simple algorithm for finding frequent elements in streams and bags. ACM Trans Database Syst (TODS), 2003, 28: 51–55
Article Google Scholar
Manku G, Motwani R. Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases. Hong Kong: VLDB Endowment, 2002. 346–357
Chapter Google Scholar
Wang W P, Li J Z, Zhang D D, et al. An efficient algorithm for mining approximate frequent item over data streams (in Chinese). J Software, 2007, 18: 884–892
Article MATH Google Scholar
Jin C, Qian W, Sha C, et al. Dynamically maintaining frequent items over a data stream. In: Proceedings of the 12th International Conference on Information and Knowledge Management. New York: ACM, 2003. 287–294
Google Scholar
Cormode G, Muthukrishnan S. What’s hot and what’s not: Tracking most frequent items dynamically. ACM Trans Database Syst (TODS), 2005, 30: 249–278
Article Google Scholar
Cormode G, Hadjieleftheriou M. Finding frequent items in data streams. In: Proceedings of the VLDB Endowment. Auckland: VLDB Endowment, 2008. 1530–1541
Google Scholar
Pang Y H, Wang J L, Xu C F. State-of-the-art on frequent pattern mining in data streams (in Chinese). Acta Automat Sin, 2006, 32: 594–602
Google Scholar
Agilent Technologies. JTC 003: Mixed packet size throughput. J Internet Test Method, 2004, 16–18. Available at http://advanced.comms.agilent.com/n2x/docs/journal/index.htm
Cao Z, Wang Z. Flow identification for supporting per-flow queueing. In: Proceedings of 9th International Conference on Computer Communications and Networks. Las Vegas: IEEE, 2000. 88–93
Google Scholar
Cormode G, Hadjieleftheriou M. Finding frequent items in data streams: Source code. http://www.research.att.com/~marioh/frequent-items/index.html
Cormode G, Muthukrishnan S. MassDAL public code bank. http://www.cs.rutgers.edu/~muthu/massdal.html
CAIDA anonymized OC48 Internet traces dataset. http://www.caida.org/data/passive/passive oc48 dataset.xml
CAIDA anonymized 2008 Internet traces dataset. http://www.caida.org/data/passive/passive 2008 dataset.xml

Download references

Author information

Authors and Affiliations

Research Center of Computer Network and Information Security Technology, Harbin Institute of Technology, Harbin, 150001, China
Yu Zhang & BinXing Fang
Research Center of Information Security, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
BinXing Fang & YongZheng Zhang

Authors

Yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
BinXing Fang
View author publications
You can also search for this author in PubMed Google Scholar
YongZheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Fang, B. & Zhang, Y. Identifying heavy hitters in high-speed network monitoring. Sci. China Inf. Sci. 53, 659–676 (2010). https://doi.org/10.1007/s11432-010-0053-5

Download citation

Received: 15 July 2009
Accepted: 19 December 2009
Published: 26 March 2010
Issue Date: March 2010
DOI: https://doi.org/10.1007/s11432-010-0053-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identifying heavy hitters in high-speed network monitoring

Abstract

Access this article

Similar content being viewed by others

Fast counting the cardinality of flows for big traffic over sliding windows

Fine-grained probability counting for cardinality estimation of data streams

D2Sketch: Supporting Efficient Identification of Heavy Hitters Over Sliding Windows

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Identifying heavy hitters in high-speed network monitoring

Abstract

Access this article

Similar content being viewed by others

Fast counting the cardinality of flows for big traffic over sliding windows

Fine-grained probability counting for cardinality estimation of data streams

D2Sketch: Supporting Efficient Identification of Heavy Hitters Over Sliding Windows

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation