Skip to main content
Log in

Fast counting the cardinality of flows for big traffic over sliding windows

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Counting the cardinality of flows for massive high-speed traffic over sliding windows is still a challenging work under time and space constrains, but plays a key role in many network applications, such as traffic management and routing optimization in software defined network. In this paper, we propose a novel data structure (called LRU-Sketch) to address the problem. The significant contributions are as follows. 1) The proposed data structure adapts a well-known probabilistic sketch to sliding window model; 2) By using the least-recently used (LRU) replacement policy, we design a highly time-efficient algorithm for timely forgetting stale information, which takes constant (O(1)) time per time slot; 3) Moreover, a further memory-reducing schema is given at a cost of very little loss of accuracy; 4) Comprehensive experiments, performed on two real IP trace files, confirm that the proposed schema attains high accuracy and high time efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Callegari C, Giordano S, Pagano M, Procissi G. Opencounter: counting unknown flows in software defined networks. In: Proceedings of the International Symposium on Performance Evaluation of Computer and Telecommunication Systems. 2015, 1–7

    Google Scholar 

  2. Callegari C, Pietro A D, Giordano S, Pepe T, Procissi G. The loglog counting reversible sketch: a distributed architecture for detecting anomalies in backbone networks. In: Proceedings of IEEE International Conference on Communications. 2012, 1287–1291

    Google Scholar 

  3. Estan C, Varghese G. New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice. ACM Transactions on Computer Systems, 2003, 21(3): 270–313

    Article  Google Scholar 

  4. Estan C, Varghese G, Fisk M E. Bitmap algorithms for counting active flows on high-speed links. IEEE/ACM Transactions on Networking, 2006, 14(5): 925–937

    Article  Google Scholar 

  5. Chen W J, Liu Y, Guan Y. Cardinality change-based early detection of large-scale cyber-attacks. In: Proceedings of IEEE International Conference on Communications. 2013, 1788–1796

    Google Scholar 

  6. Cao J, Jin Y, Chen A, Bu T, Zhang Z L. Identifying high cardinality internet hosts. In: Proceedings of IEEE International Conference on Communications. 2009, 810–818

    Google Scholar 

  7. Zheng Y Q, Li M. ZOE: fast cardinality estimation for large-scale RFID systems. In: Proceedings of IEEE International Conference on Communications. 2013, 908–916

    Google Scholar 

  8. Chen A, Li L E, Cao J. Tracking cardinality distributions in network traffic. In: Proceedings of IEEE International Conference on Communications. 2009, 819–827

    Google Scholar 

  9. Huang Q, Lee P P C. Ld-sketch: a distributed sketching design for accurate and scalable anomaly detection in network data streams. In: Proceedings of IEEE International Conference on Communications. 2014, 1420–1428

    Google Scholar 

  10. Huang Q, Lee P P C. A hybrid local and distributed sketching design for accurate and scalable heavy key detection in network data streams. Computer Networks, 2015, 91: 298–315

    Article  Google Scholar 

  11. Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data stream systems. In: Proceedings of the 21st Symposium on Principles of Database Systems. 2002, 1–16

    Google Scholar 

  12. Gibbons P B, Matias Y. Synopsis data structures for massive data sets. In: Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms. 1999, 909–910

    Google Scholar 

  13. Hua Y, Xiao B, Veeravalli B, Feng D. Locality-sensitive bloom filter for approximate membership query. IEEE Transactions on Computers, 2012, 61(6): 817–830

    Article  MathSciNet  Google Scholar 

  14. Yu Y, Qian C, Li X. Distributed and collaborative traffic monitoring in software defined networks. In: Proceedings of the 3rd Workshop on Hot Topics in Software Defined Networking. 2014, 85–90

    Google Scholar 

  15. Whang K, Zanden B T V, Taylor H M. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems, 1990, 15(02): 208–229

    Article  Google Scholar 

  16. Flajolet P, Martin G N. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 1985, 31(2): 182–209

    Article  MathSciNet  MATH  Google Scholar 

  17. Giroire F. Order statistics and estimating cardinalities of massive data sets. Discrete Applied Mathematics, 2009, 157(2): 406–427

    Article  MathSciNet  MATH  Google Scholar 

  18. Durand M, Flajolet P. Loglog counting of large cardinalities (extended abstract). In: Proceedings of European Symposium on Algorithms. 2003, 605–617

    Google Scholar 

  19. Oneil E, Oneil P, Weikum G. The LRU-K page replacement algorithm for database disk buffering. In: Proceedings of ACM Special Interest Group on Management Of Data. 1993, 297–306

    Google Scholar 

  20. Heule S, Nunkesser M, Hall A. Hyperloglog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm. In: Proceedings of Joint International Conference on Extending Database Technology. 2013, 683–692

    Google Scholar 

  21. Chen A, Cao J. Distinct counting with a self-learning bitmap. In: Proceedings of the 25th International Council for Open and Distance Education. 2009, 1171–1174

    Google Scholar 

  22. Metwally A, Agrawal D, El Abbadi A. Why go logarithmic if we can go linear?: Towards effective distinct counting of search traffic. In: Proceedings of Joint International Conference on Extending Database Technology. 2008, 618–629

    Google Scholar 

  23. Aouiche K, Lemire D. A comparison of five probabilistic view-size estimation techniques in OLAP. In: Proceedings of the 10th ACM International Workshop on Data Warehousing and OLAP. 2007, 17–24

    Google Scholar 

  24. Chabchoub Y, Chiky R, Dogan B. How can sliding hyperloglog and EWMA detect port scan attacks in IP traffic? EURASIP Journal on Information Security, 2014, 2014: 5

    Article  Google Scholar 

  25. Ben-Basat R, Einziger G, Friedman R, Kassner Y. Heavy hitters in streams and sliding windows. In: Proceedings of IEEE International Conference on Communications. 2016

    Google Scholar 

  26. Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. SIAM Journal on Computing, 2002, 31(6): 1794–1813

    Article  MathSciNet  MATH  Google Scholar 

  27. Kim H, O’Hallaron D R. Counting network flows in real time. In: Proceedings of IEEE Global Communications Conference. 2003, 3888–3893

    Google Scholar 

  28. Sanjuàs-Cuxart J, Barlet-Ros P, Solé-Pareta J. Counting flows over sliding windows in high speed networks. In: Proceedings of International Conference on Research in Networking. 2009, 79–91

    Google Scholar 

  29. Yi K, Wang L, Wei Z W. Indexing for summary queries: theory and practice. ACM Transactions on Database Systems, 2014, 39(1): 2

    Article  MathSciNet  MATH  Google Scholar 

  30. Zhang Z, Wang B Q, Lan J L. Identifying elephant flows in Internet backbone traffic with bloom filters and LRU. Computer Communications, 2015, 61: 70–78

    Article  Google Scholar 

  31. Mitzenmacher M, Upfal E. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. New York: Cambridge University Press, 2005

    Book  MATH  Google Scholar 

  32. Jain R, Rountier S A. Packet trains-measurements and a new model for computer network traffic. IEEE Journal on Selected Areas in Communications, 1986, 4(6): 986–995

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National High Technology Research and Development Program of China (2012AA01A510 and 2012AA01A509), and partially supported by the National Natural Science Foundation of China (NSFC) (Grant Nos. 61402518, 61403060), and the Jiangsu Province Science Foundation for Youths (BK20150722).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guiqiang Ni.

Additional information

Jingsong Shan is a lecturer at Huaiyin Institute of Technology, China. He received his MS from Guizhou University, China in 2006. He is currently a PhD candidate in PLA University of Science and Technology, China. His current research interests include information retrieval, random algorithm, and machine learning.

Yinjin Fu is an assistant professor in PLA University of Science and Technology, China. He received the BS degree in mathematics from Nanjing University, China in 2006, the MS degree and PhD degree in computer science from the College of Computer at National University of Defense Technology, China in 2008 and 2013, respectively. During the PhD period, he also joined the Department of Computer Science and Engineering at University of Nebraska- Lincoln, USA as a visiting scholar from 2010 to 2012. He has more than 20 publications to his credit in journals and international conferences including IEEE TPDS, ACM ToS, JCST, MIDDLEWARE, CLUSTER, NAS, etc. Currently, his research interests include big data management, cloud storage, and distributed file systems.

Guiqiang Ni received the MS degree in radio communication from Nanjing Institute of Communications, China in 1987, and his PhD degree in computer application technology from PLA University of Science and Technology (PLA UST), China. He is now a full professor in PLA UST. His research interests include random algorithm, computer networks, and computer graphics.

Jianxin Luo is a lecturer at PLA University of Science and Technology (PLA UST), China. He received his BS, MS, and PhD degrees from PLA UST in 2005, 2008, and 2012, respectively. His research interests include algorithm design and computer graphics.

Zhaofeng Wu received his BS degree in information security from Nanjing University of Posts and Telecommunications, China in 2009 and his MS degree in network information security from PLA University of Science and Technology (PLA UST), China in 2012. He is currently a PhD candidate in PLA UST. His research interests include satellite networks, computer networks, and network security.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shan, J., Fu, Y., Ni, G. et al. Fast counting the cardinality of flows for big traffic over sliding windows. Front. Comput. Sci. 11, 119–129 (2017). https://doi.org/10.1007/s11704-016-6053-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-016-6053-x

Keywords

Navigation