Abstract
Counting the cardinality of flows for massive high-speed traffic over sliding windows is still a challenging work under time and space constrains, but plays a key role in many network applications, such as traffic management and routing optimization in software defined network. In this paper, we propose a novel data structure (called LRU-Sketch) to address the problem. The significant contributions are as follows. 1) The proposed data structure adapts a well-known probabilistic sketch to sliding window model; 2) By using the least-recently used (LRU) replacement policy, we design a highly time-efficient algorithm for timely forgetting stale information, which takes constant (O(1)) time per time slot; 3) Moreover, a further memory-reducing schema is given at a cost of very little loss of accuracy; 4) Comprehensive experiments, performed on two real IP trace files, confirm that the proposed schema attains high accuracy and high time efficiency.
Similar content being viewed by others
References
Callegari C, Giordano S, Pagano M, Procissi G. Opencounter: counting unknown flows in software defined networks. In: Proceedings of the International Symposium on Performance Evaluation of Computer and Telecommunication Systems. 2015, 1–7
Callegari C, Pietro A D, Giordano S, Pepe T, Procissi G. The loglog counting reversible sketch: a distributed architecture for detecting anomalies in backbone networks. In: Proceedings of IEEE International Conference on Communications. 2012, 1287–1291
Estan C, Varghese G. New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice. ACM Transactions on Computer Systems, 2003, 21(3): 270–313
Estan C, Varghese G, Fisk M E. Bitmap algorithms for counting active flows on high-speed links. IEEE/ACM Transactions on Networking, 2006, 14(5): 925–937
Chen W J, Liu Y, Guan Y. Cardinality change-based early detection of large-scale cyber-attacks. In: Proceedings of IEEE International Conference on Communications. 2013, 1788–1796
Cao J, Jin Y, Chen A, Bu T, Zhang Z L. Identifying high cardinality internet hosts. In: Proceedings of IEEE International Conference on Communications. 2009, 810–818
Zheng Y Q, Li M. ZOE: fast cardinality estimation for large-scale RFID systems. In: Proceedings of IEEE International Conference on Communications. 2013, 908–916
Chen A, Li L E, Cao J. Tracking cardinality distributions in network traffic. In: Proceedings of IEEE International Conference on Communications. 2009, 819–827
Huang Q, Lee P P C. Ld-sketch: a distributed sketching design for accurate and scalable anomaly detection in network data streams. In: Proceedings of IEEE International Conference on Communications. 2014, 1420–1428
Huang Q, Lee P P C. A hybrid local and distributed sketching design for accurate and scalable heavy key detection in network data streams. Computer Networks, 2015, 91: 298–315
Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data stream systems. In: Proceedings of the 21st Symposium on Principles of Database Systems. 2002, 1–16
Gibbons P B, Matias Y. Synopsis data structures for massive data sets. In: Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms. 1999, 909–910
Hua Y, Xiao B, Veeravalli B, Feng D. Locality-sensitive bloom filter for approximate membership query. IEEE Transactions on Computers, 2012, 61(6): 817–830
Yu Y, Qian C, Li X. Distributed and collaborative traffic monitoring in software defined networks. In: Proceedings of the 3rd Workshop on Hot Topics in Software Defined Networking. 2014, 85–90
Whang K, Zanden B T V, Taylor H M. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems, 1990, 15(02): 208–229
Flajolet P, Martin G N. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 1985, 31(2): 182–209
Giroire F. Order statistics and estimating cardinalities of massive data sets. Discrete Applied Mathematics, 2009, 157(2): 406–427
Durand M, Flajolet P. Loglog counting of large cardinalities (extended abstract). In: Proceedings of European Symposium on Algorithms. 2003, 605–617
Oneil E, Oneil P, Weikum G. The LRU-K page replacement algorithm for database disk buffering. In: Proceedings of ACM Special Interest Group on Management Of Data. 1993, 297–306
Heule S, Nunkesser M, Hall A. Hyperloglog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm. In: Proceedings of Joint International Conference on Extending Database Technology. 2013, 683–692
Chen A, Cao J. Distinct counting with a self-learning bitmap. In: Proceedings of the 25th International Council for Open and Distance Education. 2009, 1171–1174
Metwally A, Agrawal D, El Abbadi A. Why go logarithmic if we can go linear?: Towards effective distinct counting of search traffic. In: Proceedings of Joint International Conference on Extending Database Technology. 2008, 618–629
Aouiche K, Lemire D. A comparison of five probabilistic view-size estimation techniques in OLAP. In: Proceedings of the 10th ACM International Workshop on Data Warehousing and OLAP. 2007, 17–24
Chabchoub Y, Chiky R, Dogan B. How can sliding hyperloglog and EWMA detect port scan attacks in IP traffic? EURASIP Journal on Information Security, 2014, 2014: 5
Ben-Basat R, Einziger G, Friedman R, Kassner Y. Heavy hitters in streams and sliding windows. In: Proceedings of IEEE International Conference on Communications. 2016
Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. SIAM Journal on Computing, 2002, 31(6): 1794–1813
Kim H, O’Hallaron D R. Counting network flows in real time. In: Proceedings of IEEE Global Communications Conference. 2003, 3888–3893
Sanjuàs-Cuxart J, Barlet-Ros P, Solé-Pareta J. Counting flows over sliding windows in high speed networks. In: Proceedings of International Conference on Research in Networking. 2009, 79–91
Yi K, Wang L, Wei Z W. Indexing for summary queries: theory and practice. ACM Transactions on Database Systems, 2014, 39(1): 2
Zhang Z, Wang B Q, Lan J L. Identifying elephant flows in Internet backbone traffic with bloom filters and LRU. Computer Communications, 2015, 61: 70–78
Mitzenmacher M, Upfal E. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. New York: Cambridge University Press, 2005
Jain R, Rountier S A. Packet trains-measurements and a new model for computer network traffic. IEEE Journal on Selected Areas in Communications, 1986, 4(6): 986–995
Acknowledgements
This work was supported by the National High Technology Research and Development Program of China (2012AA01A510 and 2012AA01A509), and partially supported by the National Natural Science Foundation of China (NSFC) (Grant Nos. 61402518, 61403060), and the Jiangsu Province Science Foundation for Youths (BK20150722).
Author information
Authors and Affiliations
Corresponding author
Additional information
Jingsong Shan is a lecturer at Huaiyin Institute of Technology, China. He received his MS from Guizhou University, China in 2006. He is currently a PhD candidate in PLA University of Science and Technology, China. His current research interests include information retrieval, random algorithm, and machine learning.
Yinjin Fu is an assistant professor in PLA University of Science and Technology, China. He received the BS degree in mathematics from Nanjing University, China in 2006, the MS degree and PhD degree in computer science from the College of Computer at National University of Defense Technology, China in 2008 and 2013, respectively. During the PhD period, he also joined the Department of Computer Science and Engineering at University of Nebraska- Lincoln, USA as a visiting scholar from 2010 to 2012. He has more than 20 publications to his credit in journals and international conferences including IEEE TPDS, ACM ToS, JCST, MIDDLEWARE, CLUSTER, NAS, etc. Currently, his research interests include big data management, cloud storage, and distributed file systems.
Guiqiang Ni received the MS degree in radio communication from Nanjing Institute of Communications, China in 1987, and his PhD degree in computer application technology from PLA University of Science and Technology (PLA UST), China. He is now a full professor in PLA UST. His research interests include random algorithm, computer networks, and computer graphics.
Jianxin Luo is a lecturer at PLA University of Science and Technology (PLA UST), China. He received his BS, MS, and PhD degrees from PLA UST in 2005, 2008, and 2012, respectively. His research interests include algorithm design and computer graphics.
Zhaofeng Wu received his BS degree in information security from Nanjing University of Posts and Telecommunications, China in 2009 and his MS degree in network information security from PLA University of Science and Technology (PLA UST), China in 2012. He is currently a PhD candidate in PLA UST. His research interests include satellite networks, computer networks, and network security.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Shan, J., Fu, Y., Ni, G. et al. Fast counting the cardinality of flows for big traffic over sliding windows. Front. Comput. Sci. 11, 119–129 (2017). https://doi.org/10.1007/s11704-016-6053-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-016-6053-x