Skip to main content
Log in

Knowledge Discovery: Can It Shed New Light on Threshold Definition for Heavy-Hitter Detection?

  • Published:
Journal of Network and Systems Management Aims and scope Submit manuscript

Abstract

Heavy-Hitter (HH) flows are well-known in the field of networking, mainly due to their resource consumption, which is considerably higher than the majority of flows. Their reliable detection and management are critical to optimising network performance. Nevertheless, to date, there is no generally accepted and widely used methodology for HH threshold selection. Indeed, different works use distinct thresholds without the support of a detailed or systematic study. In this paper, we provide useful insights and suggestions on how to determine more justified and valid thresholds. Based on the obtained results, we conclude that no threshold can be used universally to separate flows into HHs and non-HHs. A threshold that performs efficiently in one network may underperform in another. Threshold and HH definitions are often application-dependent, and therefore, threshold selection should include a detailed analysis of the network and its traffic. We also highlight that TCP and UDP flows should be classified with different thresholds because HHs exhibit different characteristics in such protocols. Lastly, we point out that the use of more than one threshold leads to accuracy and efficacy improvements in HHs classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. https://www.wireshark.org/docs/man-pages/tshark.html.

  2. https://www.nfstream.org/.

  3. https://www.ntop.org/products/deep-packet-inspection/ndpi/.

References

  1. Baruch, Z., Peculea, A., Arsinte, R., Suciu, M., Majo, Z.: Embedded system for network flow identification. In: Proceedings of the IEEE International Conference on Automation, Quality and Testing, Robotics, vol. 1, May 2006, pp. 426–429

  2. Brownlee, N., Claffy, K.C.: Understanding internet traffic streams: dragonflies and tortoises. IEEE Commun. Mag. 40(10), 110–117 (2002)

    Article  Google Scholar 

  3. Lan, K.-C., Heidemann, J.: A measurement study of correlations of internet flow characteristics. Comput. Netw. 50(1), 46–62 (2006)

    Article  Google Scholar 

  4. Smith, R.D.: The dynamics of internet traffic: self-similarity, self-organization, and complex phenomena. Adv. Complex Syst. 14(6), 905–949 (2011)

    Article  MathSciNet  Google Scholar 

  5. Benson, T., Anand. A., Akella, A., Zhang, M.: Microte: fine grained traffic engineering for data centers. In: Proceedings of the 7th Conference on Emerging Networking Experiments and Technologies, pp. 1–8 (2011)

  6. Awduche, D., Chiu, A., Elwalid, A., Widjaja, I., Xiao, X.: Overview and principles of internet traffic engineering. In: Proceedings of the 21th IEEE International Conference on Computer Communications Workshops (NOMEN), pp. 357–362 (2002)

  7. Callado, A., Kamienski, C., Szabo, G., Gero, B.P., Kelner, J., Fernandes, S., Sadok, D.: A survey on internet traffic identification. IEEE Commun. Surv. Tutor. 11(3), 37–52 (2009)

    Article  Google Scholar 

  8. Sarvotham, S., Riedi, R., Baraniuk, R.: Connection-level analysis and modeling of network traffic. In: Proceedings of the IMC ’01, pp. 99–103 (2001)

  9. Mitzenmacher, M., Steinke, T., Thaler, J.: Hierarchical heavy hitters with the space saving algorithm. in: Proceedings of the Fourteenth Workshop on Algorithm Engineering and Experiments (ALENEX). SIAM 2012, 160–174 (2012)

  10. Sivaraman, V., Narayana, S., Rottenstreich, O., Muthukrishnan, S., Rexford, J.: Heavy-hitter detection entirely in the data plane. In: Proceedings of the Symposium on SDN Research, ser. SOSR ’17, ACM, Santa Clara, 2017, pp. 164–176 (2017)

  11. Mogul, J.C., Tourrilhes, J., Yalagandula, P., Sharma, P., Curtis, A.R., Banerjee, S.: Devoflow: cost-effective flow management for high performance enterprise networks. In: Proceedings of the 9th ACM SIGCOMM Workshop on HotNets, ser. HotNets’10, Monterey, California: ACM, 2010, pp. 1–6 (2010)

  12. Al-Fares, M., Radhakrishnan, S., Raghavan,B., Huang, N., Vahdat, A.: Hedera: Dynamic flow scheduling for data center networks. In: Proceedings of the 7th USENIX Conf. on Networked Systems Design and Implementation, ser. NSDI’10: USENIX Association, San Jose, 2010, pp. 19–19 (2010)

  13. Farrington, N., Porter, G., Radhakrishnan, S., Bazzaz, H.H., Subramanya, V., Fainman, Y., Papen, G., Vahdat, A.: Helios: a hybrid electrical/optical switch architecture for modular data centers. ACM SIGCOMM Comput. Commun. Rev. 40(4), 339 (2010)

    Article  Google Scholar 

  14. Wette, P., Karl, H.: HybridTE: traffic engineering for very low-cost software-defined data- center networks. in: Proceedings of the European Workshop on Software Defined Networks, EWSDN, pp. 31–36 (2015)

  15. Curtis, A.R., Kim, W., Yalagandula, P.: Mahout: Low-overhead datacenter traffic management using end-host-based elephant detection, In: Proceedings of the 30th IEEE Int. Conf. on Computer Communications, ser. INFOCOM’11, 2011, pp. 1629–1637 (2011)

  16. Estrada-Solano, F., Caicedo, O.M., Da Fonseca, N.L.S.: Nelly: flow detection using incremental learning at the server side of sdn-based data centers. IEEE Trans. Ind. Inf. 16(2), 1362–1372 (2020)

    Article  Google Scholar 

  17. Bi, C., Luo, X., Ye, T., Jin, Y.:On precision and scalability of elephant flow detection in data center with SDN. In: Proceedings of the 32nd IEEE Global Communications Conf. Workshops, ser. GLOBECOM’ 13, 2013, pp. 1227–1232 (2013)

  18. Wette, P., Karl, H.: HybridTE: traffic engineering for very low-cost software-defined data-center networks. In: Proceedings of the European Workshop on Software Defined Networks, EWSDN, pp. 31–36 (2015)

  19. Wang, C., Zhang,G., Chen, H., Xu, H.: An aco-based elephant and mice flow scheduling system in sdn. In: Proceedings of the 2nd IEEE Int. Conf. on Big Data Analysis, ser. ICBDA’17, Mar. 2017, pp. 859– 863 (2017)

  20. Xu,H., Li, B.: Repflow: minimizing flow completion times with replicated flows in data centers. In: Proceedings of the IEEE INFOCOM, Apr. 2014, pp. 1581–1589 (2014)

  21. Munir, A., Qazi, I. A., Uzmi, Z. A., Mushtaq, A., Ismail, S. N., M. Iqbal, S., Khan, B.: Minimizing flow completion times in data centers. In: Proceedings of the 2013 IEEE INFOCOM, Apr. 2013, pp. 2157–2165 (2013)

  22. Hong, C.-Y., Caesar, M., Godfrey, P. B.: Finishing flows quickly with preemptive scheduling. In: Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, ser. SIGCOMM ’12, Helsinki, Finland: ACM, 2012, pp. 127–138 (2012)

  23. Alizadeh, M., Greenberg, A., Maltz, D.A., Padhye, J., Patel, P., Prabhakar, B., Sengupta, S., Sridharan, M.: Data center tcp (dctcp). SIGCOMM Comput. Commun. Rev. 41(4), 63–74 (2010)

    Article  Google Scholar 

  24. Cui, W., Yu, Y., Qian, C.: DiFS: distributed flow scheduling for adaptive switching in FatTree data center networks. Comput. Netw. 105, 166–179 (2016)

    Article  Google Scholar 

  25. Wu, X., Yang, X.: DARD: distributed adaptive routing for datacenter networks, In: Proceedings of the International Conference on Distributed Computing Systems, pp. 32–41 (2012)

  26. Greenberg, A., Hamilton, J.R., Jain, N., Kandula, S., Kim, C., Lahiri, P., Maltz, D.A., Patel, P., Sengupta, S.: Vl2: a scalable and flexible data center network. SIGCOMM Comput. Commun. Rev. 39(4), 51–62 (2009)

    Article  Google Scholar 

  27. Xiao, P., Qu, W., Qi, H., Xu, Y., Li, Z.: An efficient elephant flow detection with cost-sensitive in sdn. In: Proceedings of the 2015 1st International Conference on Industrial Networks and Intelligent Systems (INISCom), Mar. 2015, pp. 24–28 (2015)

  28. Benson, T., Akella, A., Maltz, D. A.: Network traffic characteristics of data centers in the wild. In: Proceedings of the 10th Internet Measurement Conf., ser. IMC ’10, Melbourne, Australia: ACM, 2010, pp. 267–280 (2010)

  29. Benson, T., Anand, A., Akella, A., Zhang, M.: Understanding data center traffic characteristics. In: Proceedings of the 1st ACM Workshop on Research on Enterprise Networking, ser. WREN ’09, Barcelona, Spain: Association for Computing Machinery, 2009, 65–72 (2009)

  30. Fayyad, U., Piatetsky-shapiro, G., Smyth, P., Widener, T.: The kdd process for extracting useful knowledge from volumes of data. Commun. ACM 39, 27–34 (1996)

    Article  Google Scholar 

  31. Gullo, F.: From patterns in data to knowledge discovery: What data mining can do. In: Proceedings of the Physics Procedia, 62, pp. 18–22: 3rd International Conference Frontiers in Diagnostic Technologies, ICFDT3 2013, 25–27 November 2013. Laboratori Nazionali di Frascati, Italy (2015)

  32. Bishop, C.M.: Pattern recognition and machine learning. Springer, New York (2006)

    MATH  Google Scholar 

  33. Metwally, A., Agrawal, D., El Abbadi, A.: Efficient computation of frequent and top-k elements in data streams. In: Proceedings of the 10th International Conference on Database Theory, ser. ICDT’05, Edinburgh, UK: Springer-Verlag, 2005, pp. 398–412 (2005)

  34. Cios, K. J.,Swiniarski, R. W., Pedrycz, W., Kurgan, L. A.: The knowledge discovery process. In: Proceedings of the Data Mining: A Knowledge Discovery Approach. Boston, MA: Springer US, 2007, pp. 9–24 (2007)

  35. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Mag. 17(3), 37 (1996)

    Google Scholar 

  36. Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C. R., Wirth, R.: Crispdm 1.0: step-by-step data mining guide. In: Proceedings of the SPSS inc, vol. 9, p. 13 (2000)

  37. Huber, S., Wiemer, H., Schneider, D., Ihlenfeldt, S.: Dmme: data mining methodology for engineering applications—a holistic extension to the crisp-dm model. In: Proceedings of the CIRP, 79, pp. 403–408: 12th CIRP Conference on Intelligent Computation in Manufacturing Engineering, 18–20 July 2018. Gulf of Naples, Italy (2019)

  38. Cios, K.J., Pedrycz, W., Swiniarski, R.W.: Data mining and knowledge discovery. In: Data Mining Methods for Knowledge Discovery. Springer US, Boston, pp. 1–26 (1998)

  39. Cios, K.J., Pedrycz, W., Swiniarski, R.W., Kurgan, L.A.: Data Mining: A Knowledge Discovery Approach. Springer-Verlag, Berlin, Heidelberg (2007)

    MATH  Google Scholar 

  40. Hofstede, R., Çeleda, P., Trammell, B., Drago, I., Sadre, R., Sperotto, A., Pras, A.: Flow monitoring explained: from packet capture to data analysis with netflow and ipfix. IEEE Commun. Surv. Tutor. 16(4), 2037–2064 (2014)

    Article  Google Scholar 

  41. Crovella, M.E., Bestavros, A.: Self-similarity in world wide web traffic: Evidence and possible causes. IEEE/ACM Trans. Netw. 5(6), 835–846 (1997)

    Article  Google Scholar 

  42. Shakkottai, S., Brownlee, N., Claffy, K. C.: A study of burstiness in tcp flows. In: Proceedings of the Int. Conf. on Passive and Active Network Measurement, C. Dovrolis, Ed., ser. PAM’05, Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, pp. 13–26 (2005)

  43. Curtis, A.R., Mogul, J.C., Tourrilhes, J., Yalagandula, P., Sharma, P., Banerjee, S.: Devoflow: scaling flow management for high-performance networks. SIGCOMM Comput. Commun. Rev. 41(4), 254–265 (2011)

    Article  Google Scholar 

  44. Poupart, P., Chen, Z., Jaini, P., Fung, F., Susanto, H., Geng, Y., Chen, L., Chen, K., Jin, H.: Online flow size prediction for improved network routing. In: Proceedings of the 24th IEEE Int. Conf. on Network Protocols, ser. ICNP’16, Nov. 2016, pp. 1–6 (2016)

  45. Liu, R., Gu, H., Yu, X., Nian, X.: Distributed flow scheduling in energy-aware data center networks. IEEE Commun. Lett. 17(4), 801–804 (2013)

    Article  Google Scholar 

  46. Chiesa, M., Kindler, G., Schapira, M.: Traffic engineering with equal-cost-multipath: an algorithmic perspective. IEEE/ACM Trans. Netw. 25(2), 779–792 (2017)

    Article  Google Scholar 

  47. Benson, T.: Data set for IMC 2010 data center measurement, accessed Oct. 1, 2018, University of Wisconsin-Madison

  48. The CAIDA Anonymized Equinix-Chicago Internet Traces 2016 Dataset, Jun 17th

  49. The CAIDA Anonymized Equinix-nyc Internet Traces 2018 Dataset, Mar 19th

  50. Duque-Torres, A., Pekar, A., Seah, W. K. G., Rendon, O. M. C.: Heavy-hitter flow identification in data centre networks using packet size distribution and template matching. In: Proceedings of the 2019 IEEE 44th Conference on Local Computer Networks (LCN), 2019, pp. 10–17 (2019)

  51. Zhong, S., Khoshgoftaar, T.M., Seliya, N.: Analyzing software measurement data with clustering techniques. IEEE Intell. Syst. 19(2), 20–27 (2004)

    Article  Google Scholar 

  52. Jain, A.K.: Data clustering: 50 years beyond k-means. In: Proceedings of the Pattern Recognition Letters, vol. 31, no. 8, pp. 651 –666, 2010, Award winning papers from the 19th International Conference on Pattern Recognition (ICPR) (2010)

  53. Kurgan, L.A., Musilek, P.: A survey of knowledge discovery and data mining process models. Knowl. Eng. Rev. 21(1), 1–24 (2006)

    Article  Google Scholar 

  54. Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. 2(2), 165–193 (2015)

    Article  MathSciNet  Google Scholar 

  55. Erman, J., Arlitt, M., Mahanti, A.: Traffic classification using clustering algorithms. In: Proceedings of the 2006 SIGCOMMWorkshop on Mining Network Data, ser. MineNet ’06, Pisa, Italy: ACM, 2006, pp. 281–286 (2006)

  56. Zhang, J., Xiang, Y., Zhou, W., Wang, Y.: Unsupervised traffic classification using flow statistical properties and ip packet payload. J. Comput. Syst. Sci. 79(5), 573–585 (2013)

    Article  MathSciNet  Google Scholar 

  57. Mohiuddin, A., Raihan, S., Shamsul, S.M.: The k-means algorithm: a comprehensive survey and performance evaluation. Electronics 9 (2020)

  58. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  59. Y. Liu, Z. Li, H. Xiong, X. Gao, and J. Wu: Understanding of internal clustering validation measures. In: Proceedings of the 2010 IEEE International Conference on Data Mining, Dec. 2010, pp. 911–916 (2010)

  60. Wang, F., Franco-Penya, H.-H., and Kelleher, J.D.: An analysis of the application of simplified silhouette to the evaluation of k-means clustering validity. In: Proceedings of the 13th International Conference on Machine Learning and Data Mining MLDM, ser. MLDM’17, New York, USA, 2017, pp. 19–19 (2017)

  61. Subbalakshmi, C., Krishna, G.R., Rao, S.K.M., Rao, P.V.: A method to find optimum number of clusters based on fuzzy silhouette on dynamic data set. Procedia Comput. Sci. 46, 346–353 (2015)

    Article  Google Scholar 

  62. Li, X., Qian, C.: Low-complexity multi-resource packet scheduling for network function virtualization. In: Proceedings of the 34th IEEE Int. Conf. on Computer Communications, ser. INFOCOM’15, Apr. 2015, pp. 1400–1408 (2015)

  63. Carpio, F., Engelmann, A., Jukan, A.: Diffflow: differentiating short and long flows for load balancing in data center networks. In: Proceedings of the 35th IEEE Global Communications Conf., ser. GLOBECOM’16, Dec. 2016, pp. 1–6 (2016)

  64. Basat, R. B., Einziger, G., Friedman, R., Kassner, Y.: Optimal elephant flow detection. In: Proceedings of the IEEE INFOCOM 2017—IEEE Conference on Computer Communications, 2017, pp. 1–9 (2017)

  65. Chao, S., Lin, K.C., Chen, M.: Flow classification for software-defined data centers using stream mining. IEEE Trans. Serv. Comput. (2018). https://doi.org/10.1109/TSC.2016.2597846

    Article  Google Scholar 

Download references

Acknowledgements

A. Duque-Torres was supported by the ISIF Internet Operations Research Grant (Project #E3164). A. Pekar and W.K.G. Seah were supported by VUW’s Huawei NZ Research Programme, Software-Defined Green Internet of Things (Project #E2881). A. Pekar completed his part of this work as a Postdoctoral Fellow at the School of Engineering and Computer Science, Victoria University of Wellington, New Zealand. A. Duque-Torres completed her part of this work at the University of Cauca, Colombia and the Victoria University of Wellington, New Zealand.

Author information

Authors and Affiliations

Authors

Contributions

All the authors participated in the conception and design of the work. Furthermore, all the authors believe that the manuscript represents valid work.

Corresponding author

Correspondence to Oscar Caicedo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Funding

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pekar, A., Duque-Torres, A., Seah, W.K.G. et al. Knowledge Discovery: Can It Shed New Light on Threshold Definition for Heavy-Hitter Detection?. J Netw Syst Manage 29, 24 (2021). https://doi.org/10.1007/s10922-021-09593-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10922-021-09593-w

Keywords

Navigation