Skip to main content
Log in

Identifying heavy hitters in high-speed network monitoring

  • Research Papers
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Identifying heavy hitters in a network traffic stream is important for a variety of network applications ranging from traffic engineering to anomaly detection such as detection of denial-of-service attacks. Existing methods generally examine newly arriving items in the stream, perform a small number of operations using a small amount of memory, and still provide guarantees on the identifying accuracy. In high-speed network monitoring, the update speed per item is extremely critical. However, so far as we know, there are no identifying algorithms which can provide constant update time (O(1)) in a weighted data stream. In this paper, we present an algorithm named Weighted Lossy Counting (WLC) which is able to identify heavy hitters in a high-speed weighted data stream with constant update time. WLC employs a novel efficient partially ordered data structure which is able to provide a fast per-item update speed while keeping the memory cost relatively low. We compare WLC with state-of-the-art algorithms for finding heavy hitters in real traffic traces. The experimental results show that WLC performs well in accuracy (recall, precision and average relative error) as other algorithms; moreover it has a much higher update speed at the cost of relatively larger memory space used. A theoretical worst-case memory bound of WLC is also derived in this paper; however, experiments with long real traffic traces show that WLC requires much less space than the theoretical bound in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Sommer R, Feldmann A. NetFlow: Information loss or win? In: Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurement. New York: ACM, 2002. 173–174

    Chapter  Google Scholar 

  2. Fred S, Bonald T, Proutiere A, et al. Statistical bandwidth sharing: A study of congestion at flow level. ACM SIGCOMM Comput Commun Rev, 2001, 31: 111–122

    Article  Google Scholar 

  3. Mori T, Kawahara S, Naito S, et al. On the characteristics of Internet traffic variability: Spikes and elephants. In: Proceedings of the 2004 International Symposium on Applications and the Internet. Tokyo: IEEE, 2004. 99–106

    Google Scholar 

  4. Papagiannaki K, Taft N, Bhattachayya S, et al. On the feasibility of identifying elephants in Internet backbone traffic. Sprint ATL Technical Report TR01-ATL-110918, 2001

  5. Zhang Y, Breslau L, Paxson V, et al. On the characteristics and origins of internet flow rates. In: Proceedings of the 2002 SIGCOMM Conference. New York: ACM, 2002. 309–322

    Google Scholar 

  6. Fang W J, Peterson L. Inter-AS traffic patterns and their implications. In: Proceedings of GLOBECOM’99. Rio de Janeireo: IEEE, 1999. 1859–1868

    Google Scholar 

  7. Mahajan R, Floyd S, Wetherall D. Controlling high-bandwidth flows at the congested router. In: Proceedings of IEEE ICNP’01. Washington: IEEE, 2001. 192

    Google Scholar 

  8. Feldmann A, Greenberg A, Lund C, et al. Deriving traffic demands for operational IP networks: Methodology and experience. IEEE/ACM Trans Netw (ToN), 2001, 9: 265–280

    Article  Google Scholar 

  9. Estan C, Varghese G. New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice. ACM Trans Comput Syst (ToCS), 2003, 21: 270–313

    Article  Google Scholar 

  10. Wang F Y, Yun X C, Wang X F, et al. Identifying heavy hitters in high-speed network monitoring (in Chinese). J Softw, 2007, 18: 3060–3070

    Google Scholar 

  11. Wang H B, Pei Y J, Lin Y, et al. A LRU based algorithm for identifying and measuring large flows (in Chinese). J Electron Inf Tech, 2007, 29: 2487–2492

    Google Scholar 

  12. Pei Y J, Wang H B, Cheng S D, et al. A dual-LRU based algorithm for identifying and measuring large flows (in Chinese). Acta Electron Sin, 2009, 37: 684–691

    Google Scholar 

  13. Raspall F, Sallent S. Adaptive shared-state sampling. In: Proceedings of the 8th ACM SIGCOMM Workshop on Internet Measurement. New York: ACM, 2008. 271–284

    Google Scholar 

  14. Kumar A, Sung M, Xu J, et al. Data streaming algorithms for efficient and accurate estimation of flow size distribution. In: Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems. New York: ACM, 2004. 177–188

    Google Scholar 

  15. Metwally A, Agrawal D, El Abbadi A. An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans Database Syst (TODS), 2006, 31: 1095–1133

    Article  Google Scholar 

  16. Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. Theor Comput Sci, 2004, 312: 3–15

    Article  MATH  MathSciNet  Google Scholar 

  17. Cormode G, Muthukrishnan S. An improved data stream summary: The count-min sketch and its applications. J Algorithms, 2005, 55: 58–75

    Article  MATH  MathSciNet  Google Scholar 

  18. Karp R, Shenker S, Papadimitriou C. A simple algorithm for finding frequent elements in streams and bags. ACM Trans Database Syst (TODS), 2003, 28: 51–55

    Article  Google Scholar 

  19. Manku G, Motwani R. Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases. Hong Kong: VLDB Endowment, 2002. 346–357

    Chapter  Google Scholar 

  20. Wang W P, Li J Z, Zhang D D, et al. An efficient algorithm for mining approximate frequent item over data streams (in Chinese). J Software, 2007, 18: 884–892

    Article  MATH  Google Scholar 

  21. Jin C, Qian W, Sha C, et al. Dynamically maintaining frequent items over a data stream. In: Proceedings of the 12th International Conference on Information and Knowledge Management. New York: ACM, 2003. 287–294

    Google Scholar 

  22. Cormode G, Muthukrishnan S. What’s hot and what’s not: Tracking most frequent items dynamically. ACM Trans Database Syst (TODS), 2005, 30: 249–278

    Article  Google Scholar 

  23. Cormode G, Hadjieleftheriou M. Finding frequent items in data streams. In: Proceedings of the VLDB Endowment. Auckland: VLDB Endowment, 2008. 1530–1541

    Google Scholar 

  24. Pang Y H, Wang J L, Xu C F. State-of-the-art on frequent pattern mining in data streams (in Chinese). Acta Automat Sin, 2006, 32: 594–602

    Google Scholar 

  25. Agilent Technologies. JTC 003: Mixed packet size throughput. J Internet Test Method, 2004, 16–18. Available at http://advanced.comms.agilent.com/n2x/docs/journal/index.htm

  26. Cao Z, Wang Z. Flow identification for supporting per-flow queueing. In: Proceedings of 9th International Conference on Computer Communications and Networks. Las Vegas: IEEE, 2000. 88–93

    Google Scholar 

  27. Cormode G, Hadjieleftheriou M. Finding frequent items in data streams: Source code. http://www.research.att.com/~marioh/frequent-items/index.html

  28. Cormode G, Muthukrishnan S. MassDAL public code bank. http://www.cs.rutgers.edu/~muthu/massdal.html

  29. CAIDA anonymized OC48 Internet traces dataset. http://www.caida.org/data/passive/passive oc48 dataset.xml

  30. CAIDA anonymized 2008 Internet traces dataset. http://www.caida.org/data/passive/passive 2008 dataset.xml

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Fang, B. & Zhang, Y. Identifying heavy hitters in high-speed network monitoring. Sci. China Inf. Sci. 53, 659–676 (2010). https://doi.org/10.1007/s11432-010-0053-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-010-0053-5

Keywords

Navigation