Abstract
Heavy-Hitter (HH) flows are well-known in the field of networking, mainly due to their resource consumption, which is considerably higher than the majority of flows. Their reliable detection and management are critical to optimising network performance. Nevertheless, to date, there is no generally accepted and widely used methodology for HH threshold selection. Indeed, different works use distinct thresholds without the support of a detailed or systematic study. In this paper, we provide useful insights and suggestions on how to determine more justified and valid thresholds. Based on the obtained results, we conclude that no threshold can be used universally to separate flows into HHs and non-HHs. A threshold that performs efficiently in one network may underperform in another. Threshold and HH definitions are often application-dependent, and therefore, threshold selection should include a detailed analysis of the network and its traffic. We also highlight that TCP and UDP flows should be classified with different thresholds because HHs exhibit different characteristics in such protocols. Lastly, we point out that the use of more than one threshold leads to accuracy and efficacy improvements in HHs classification.
Similar content being viewed by others
References
Baruch, Z., Peculea, A., Arsinte, R., Suciu, M., Majo, Z.: Embedded system for network flow identification. In: Proceedings of the IEEE International Conference on Automation, Quality and Testing, Robotics, vol. 1, May 2006, pp. 426–429
Brownlee, N., Claffy, K.C.: Understanding internet traffic streams: dragonflies and tortoises. IEEE Commun. Mag. 40(10), 110–117 (2002)
Lan, K.-C., Heidemann, J.: A measurement study of correlations of internet flow characteristics. Comput. Netw. 50(1), 46–62 (2006)
Smith, R.D.: The dynamics of internet traffic: self-similarity, self-organization, and complex phenomena. Adv. Complex Syst. 14(6), 905–949 (2011)
Benson, T., Anand. A., Akella, A., Zhang, M.: Microte: fine grained traffic engineering for data centers. In: Proceedings of the 7th Conference on Emerging Networking Experiments and Technologies, pp. 1–8 (2011)
Awduche, D., Chiu, A., Elwalid, A., Widjaja, I., Xiao, X.: Overview and principles of internet traffic engineering. In: Proceedings of the 21th IEEE International Conference on Computer Communications Workshops (NOMEN), pp. 357–362 (2002)
Callado, A., Kamienski, C., Szabo, G., Gero, B.P., Kelner, J., Fernandes, S., Sadok, D.: A survey on internet traffic identification. IEEE Commun. Surv. Tutor. 11(3), 37–52 (2009)
Sarvotham, S., Riedi, R., Baraniuk, R.: Connection-level analysis and modeling of network traffic. In: Proceedings of the IMC ’01, pp. 99–103 (2001)
Mitzenmacher, M., Steinke, T., Thaler, J.: Hierarchical heavy hitters with the space saving algorithm. in: Proceedings of the Fourteenth Workshop on Algorithm Engineering and Experiments (ALENEX). SIAM 2012, 160–174 (2012)
Sivaraman, V., Narayana, S., Rottenstreich, O., Muthukrishnan, S., Rexford, J.: Heavy-hitter detection entirely in the data plane. In: Proceedings of the Symposium on SDN Research, ser. SOSR ’17, ACM, Santa Clara, 2017, pp. 164–176 (2017)
Mogul, J.C., Tourrilhes, J., Yalagandula, P., Sharma, P., Curtis, A.R., Banerjee, S.: Devoflow: cost-effective flow management for high performance enterprise networks. In: Proceedings of the 9th ACM SIGCOMM Workshop on HotNets, ser. HotNets’10, Monterey, California: ACM, 2010, pp. 1–6 (2010)
Al-Fares, M., Radhakrishnan, S., Raghavan,B., Huang, N., Vahdat, A.: Hedera: Dynamic flow scheduling for data center networks. In: Proceedings of the 7th USENIX Conf. on Networked Systems Design and Implementation, ser. NSDI’10: USENIX Association, San Jose, 2010, pp. 19–19 (2010)
Farrington, N., Porter, G., Radhakrishnan, S., Bazzaz, H.H., Subramanya, V., Fainman, Y., Papen, G., Vahdat, A.: Helios: a hybrid electrical/optical switch architecture for modular data centers. ACM SIGCOMM Comput. Commun. Rev. 40(4), 339 (2010)
Wette, P., Karl, H.: HybridTE: traffic engineering for very low-cost software-defined data- center networks. in: Proceedings of the European Workshop on Software Defined Networks, EWSDN, pp. 31–36 (2015)
Curtis, A.R., Kim, W., Yalagandula, P.: Mahout: Low-overhead datacenter traffic management using end-host-based elephant detection, In: Proceedings of the 30th IEEE Int. Conf. on Computer Communications, ser. INFOCOM’11, 2011, pp. 1629–1637 (2011)
Estrada-Solano, F., Caicedo, O.M., Da Fonseca, N.L.S.: Nelly: flow detection using incremental learning at the server side of sdn-based data centers. IEEE Trans. Ind. Inf. 16(2), 1362–1372 (2020)
Bi, C., Luo, X., Ye, T., Jin, Y.:On precision and scalability of elephant flow detection in data center with SDN. In: Proceedings of the 32nd IEEE Global Communications Conf. Workshops, ser. GLOBECOM’ 13, 2013, pp. 1227–1232 (2013)
Wette, P., Karl, H.: HybridTE: traffic engineering for very low-cost software-defined data-center networks. In: Proceedings of the European Workshop on Software Defined Networks, EWSDN, pp. 31–36 (2015)
Wang, C., Zhang,G., Chen, H., Xu, H.: An aco-based elephant and mice flow scheduling system in sdn. In: Proceedings of the 2nd IEEE Int. Conf. on Big Data Analysis, ser. ICBDA’17, Mar. 2017, pp. 859– 863 (2017)
Xu,H., Li, B.: Repflow: minimizing flow completion times with replicated flows in data centers. In: Proceedings of the IEEE INFOCOM, Apr. 2014, pp. 1581–1589 (2014)
Munir, A., Qazi, I. A., Uzmi, Z. A., Mushtaq, A., Ismail, S. N., M. Iqbal, S., Khan, B.: Minimizing flow completion times in data centers. In: Proceedings of the 2013 IEEE INFOCOM, Apr. 2013, pp. 2157–2165 (2013)
Hong, C.-Y., Caesar, M., Godfrey, P. B.: Finishing flows quickly with preemptive scheduling. In: Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, ser. SIGCOMM ’12, Helsinki, Finland: ACM, 2012, pp. 127–138 (2012)
Alizadeh, M., Greenberg, A., Maltz, D.A., Padhye, J., Patel, P., Prabhakar, B., Sengupta, S., Sridharan, M.: Data center tcp (dctcp). SIGCOMM Comput. Commun. Rev. 41(4), 63–74 (2010)
Cui, W., Yu, Y., Qian, C.: DiFS: distributed flow scheduling for adaptive switching in FatTree data center networks. Comput. Netw. 105, 166–179 (2016)
Wu, X., Yang, X.: DARD: distributed adaptive routing for datacenter networks, In: Proceedings of the International Conference on Distributed Computing Systems, pp. 32–41 (2012)
Greenberg, A., Hamilton, J.R., Jain, N., Kandula, S., Kim, C., Lahiri, P., Maltz, D.A., Patel, P., Sengupta, S.: Vl2: a scalable and flexible data center network. SIGCOMM Comput. Commun. Rev. 39(4), 51–62 (2009)
Xiao, P., Qu, W., Qi, H., Xu, Y., Li, Z.: An efficient elephant flow detection with cost-sensitive in sdn. In: Proceedings of the 2015 1st International Conference on Industrial Networks and Intelligent Systems (INISCom), Mar. 2015, pp. 24–28 (2015)
Benson, T., Akella, A., Maltz, D. A.: Network traffic characteristics of data centers in the wild. In: Proceedings of the 10th Internet Measurement Conf., ser. IMC ’10, Melbourne, Australia: ACM, 2010, pp. 267–280 (2010)
Benson, T., Anand, A., Akella, A., Zhang, M.: Understanding data center traffic characteristics. In: Proceedings of the 1st ACM Workshop on Research on Enterprise Networking, ser. WREN ’09, Barcelona, Spain: Association for Computing Machinery, 2009, 65–72 (2009)
Fayyad, U., Piatetsky-shapiro, G., Smyth, P., Widener, T.: The kdd process for extracting useful knowledge from volumes of data. Commun. ACM 39, 27–34 (1996)
Gullo, F.: From patterns in data to knowledge discovery: What data mining can do. In: Proceedings of the Physics Procedia, 62, pp. 18–22: 3rd International Conference Frontiers in Diagnostic Technologies, ICFDT3 2013, 25–27 November 2013. Laboratori Nazionali di Frascati, Italy (2015)
Bishop, C.M.: Pattern recognition and machine learning. Springer, New York (2006)
Metwally, A., Agrawal, D., El Abbadi, A.: Efficient computation of frequent and top-k elements in data streams. In: Proceedings of the 10th International Conference on Database Theory, ser. ICDT’05, Edinburgh, UK: Springer-Verlag, 2005, pp. 398–412 (2005)
Cios, K. J.,Swiniarski, R. W., Pedrycz, W., Kurgan, L. A.: The knowledge discovery process. In: Proceedings of the Data Mining: A Knowledge Discovery Approach. Boston, MA: Springer US, 2007, pp. 9–24 (2007)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Mag. 17(3), 37 (1996)
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C. R., Wirth, R.: Crispdm 1.0: step-by-step data mining guide. In: Proceedings of the SPSS inc, vol. 9, p. 13 (2000)
Huber, S., Wiemer, H., Schneider, D., Ihlenfeldt, S.: Dmme: data mining methodology for engineering applications—a holistic extension to the crisp-dm model. In: Proceedings of the CIRP, 79, pp. 403–408: 12th CIRP Conference on Intelligent Computation in Manufacturing Engineering, 18–20 July 2018. Gulf of Naples, Italy (2019)
Cios, K.J., Pedrycz, W., Swiniarski, R.W.: Data mining and knowledge discovery. In: Data Mining Methods for Knowledge Discovery. Springer US, Boston, pp. 1–26 (1998)
Cios, K.J., Pedrycz, W., Swiniarski, R.W., Kurgan, L.A.: Data Mining: A Knowledge Discovery Approach. Springer-Verlag, Berlin, Heidelberg (2007)
Hofstede, R., Çeleda, P., Trammell, B., Drago, I., Sadre, R., Sperotto, A., Pras, A.: Flow monitoring explained: from packet capture to data analysis with netflow and ipfix. IEEE Commun. Surv. Tutor. 16(4), 2037–2064 (2014)
Crovella, M.E., Bestavros, A.: Self-similarity in world wide web traffic: Evidence and possible causes. IEEE/ACM Trans. Netw. 5(6), 835–846 (1997)
Shakkottai, S., Brownlee, N., Claffy, K. C.: A study of burstiness in tcp flows. In: Proceedings of the Int. Conf. on Passive and Active Network Measurement, C. Dovrolis, Ed., ser. PAM’05, Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, pp. 13–26 (2005)
Curtis, A.R., Mogul, J.C., Tourrilhes, J., Yalagandula, P., Sharma, P., Banerjee, S.: Devoflow: scaling flow management for high-performance networks. SIGCOMM Comput. Commun. Rev. 41(4), 254–265 (2011)
Poupart, P., Chen, Z., Jaini, P., Fung, F., Susanto, H., Geng, Y., Chen, L., Chen, K., Jin, H.: Online flow size prediction for improved network routing. In: Proceedings of the 24th IEEE Int. Conf. on Network Protocols, ser. ICNP’16, Nov. 2016, pp. 1–6 (2016)
Liu, R., Gu, H., Yu, X., Nian, X.: Distributed flow scheduling in energy-aware data center networks. IEEE Commun. Lett. 17(4), 801–804 (2013)
Chiesa, M., Kindler, G., Schapira, M.: Traffic engineering with equal-cost-multipath: an algorithmic perspective. IEEE/ACM Trans. Netw. 25(2), 779–792 (2017)
Benson, T.: Data set for IMC 2010 data center measurement, accessed Oct. 1, 2018, University of Wisconsin-Madison
The CAIDA Anonymized Equinix-Chicago Internet Traces 2016 Dataset, Jun 17th
The CAIDA Anonymized Equinix-nyc Internet Traces 2018 Dataset, Mar 19th
Duque-Torres, A., Pekar, A., Seah, W. K. G., Rendon, O. M. C.: Heavy-hitter flow identification in data centre networks using packet size distribution and template matching. In: Proceedings of the 2019 IEEE 44th Conference on Local Computer Networks (LCN), 2019, pp. 10–17 (2019)
Zhong, S., Khoshgoftaar, T.M., Seliya, N.: Analyzing software measurement data with clustering techniques. IEEE Intell. Syst. 19(2), 20–27 (2004)
Jain, A.K.: Data clustering: 50 years beyond k-means. In: Proceedings of the Pattern Recognition Letters, vol. 31, no. 8, pp. 651 –666, 2010, Award winning papers from the 19th International Conference on Pattern Recognition (ICPR) (2010)
Kurgan, L.A., Musilek, P.: A survey of knowledge discovery and data mining process models. Knowl. Eng. Rev. 21(1), 1–24 (2006)
Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. 2(2), 165–193 (2015)
Erman, J., Arlitt, M., Mahanti, A.: Traffic classification using clustering algorithms. In: Proceedings of the 2006 SIGCOMMWorkshop on Mining Network Data, ser. MineNet ’06, Pisa, Italy: ACM, 2006, pp. 281–286 (2006)
Zhang, J., Xiang, Y., Zhou, W., Wang, Y.: Unsupervised traffic classification using flow statistical properties and ip packet payload. J. Comput. Syst. Sci. 79(5), 573–585 (2013)
Mohiuddin, A., Raihan, S., Shamsul, S.M.: The k-means algorithm: a comprehensive survey and performance evaluation. Electronics 9 (2020)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Y. Liu, Z. Li, H. Xiong, X. Gao, and J. Wu: Understanding of internal clustering validation measures. In: Proceedings of the 2010 IEEE International Conference on Data Mining, Dec. 2010, pp. 911–916 (2010)
Wang, F., Franco-Penya, H.-H., and Kelleher, J.D.: An analysis of the application of simplified silhouette to the evaluation of k-means clustering validity. In: Proceedings of the 13th International Conference on Machine Learning and Data Mining MLDM, ser. MLDM’17, New York, USA, 2017, pp. 19–19 (2017)
Subbalakshmi, C., Krishna, G.R., Rao, S.K.M., Rao, P.V.: A method to find optimum number of clusters based on fuzzy silhouette on dynamic data set. Procedia Comput. Sci. 46, 346–353 (2015)
Li, X., Qian, C.: Low-complexity multi-resource packet scheduling for network function virtualization. In: Proceedings of the 34th IEEE Int. Conf. on Computer Communications, ser. INFOCOM’15, Apr. 2015, pp. 1400–1408 (2015)
Carpio, F., Engelmann, A., Jukan, A.: Diffflow: differentiating short and long flows for load balancing in data center networks. In: Proceedings of the 35th IEEE Global Communications Conf., ser. GLOBECOM’16, Dec. 2016, pp. 1–6 (2016)
Basat, R. B., Einziger, G., Friedman, R., Kassner, Y.: Optimal elephant flow detection. In: Proceedings of the IEEE INFOCOM 2017—IEEE Conference on Computer Communications, 2017, pp. 1–9 (2017)
Chao, S., Lin, K.C., Chen, M.: Flow classification for software-defined data centers using stream mining. IEEE Trans. Serv. Comput. (2018). https://doi.org/10.1109/TSC.2016.2597846
Acknowledgements
A. Duque-Torres was supported by the ISIF Internet Operations Research Grant (Project #E3164). A. Pekar and W.K.G. Seah were supported by VUW’s Huawei NZ Research Programme, Software-Defined Green Internet of Things (Project #E2881). A. Pekar completed his part of this work as a Postdoctoral Fellow at the School of Engineering and Computer Science, Victoria University of Wellington, New Zealand. A. Duque-Torres completed her part of this work at the University of Cauca, Colombia and the Victoria University of Wellington, New Zealand.
Author information
Authors and Affiliations
Contributions
All the authors participated in the conception and design of the work. Furthermore, all the authors believe that the manuscript represents valid work.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Funding
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pekar, A., Duque-Torres, A., Seah, W.K.G. et al. Knowledge Discovery: Can It Shed New Light on Threshold Definition for Heavy-Hitter Detection?. J Netw Syst Manage 29, 24 (2021). https://doi.org/10.1007/s10922-021-09593-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10922-021-09593-w