Abstract
Accurate network traffic classification is vital to the areas of network security. There are many applications using dynamic ports and encryption to avoid detection, so previous methods such as port number and payload-based classification exist some shortfalls. An alternative approach is to use Machine learning (ML) techniques. Here we will present three clustering algorithms, K-Means, FarthestFirst and Canopy, based on flow statistic features of applications. The performance impact of the data set processed by the PCA dimension reduction algorithm on the above three algorithms will be an important topic for our discussion. Our results show that the classification accuracy and computational performance all have been significantly improved after dimension reduction.
Supported by Graduate Innovation Capacity Development Funding Program of Guangzhou University (2018GDJC-M15).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Al-Saadi, M., Ghita, B.V., Shiaeles, S., Sarigiannidis, P.: A novel approach for performance-based clustering and management of network traffic flows. In: 2019 15th International Wireless Communications Mobile Computing Conference (IWCMC), pp. 2025–2030, June 2019. https://doi.org/10.1109/IWCMC.2019.8766728
Erman, J., Arlitt, M., Mahanti, A.: Traffic classification using clustering algorithms. In: Proceedings of the 2006 SIGCOMM Workshop on Mining Network Data, MineNet 2006, pp. 281–286. ACM, New York (2006). https://doi.org/10.1145/1162678.1162679
Filho, R.H., Maia, J.E.B.: Network traffic prediction using pca and k-means. In: 2010 IEEE Network Operations and Management Symposium - NOMS 2010, pp. 938–941, April 2010. https://doi.org/10.1109/NOMS.2010.5488338
Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc. Seri. C (Appl. Stat.) (1), 100–108. http://www.jstor.org/stable/2346830
Hochbaum, D.S., Shmoys, D.B.: A best possible heuristic for the k-center problem. Math. Oper. Res. 10(2), 180–184 (1985). http://www.jstor.org/stable/3689371
Jamuna, A., Ewards, V.: Survey of traffic classification using machine learning. Int. J. Adv. Res. Comput. Sci. 4(4) (2013)
Kumar, A., Ingle, Y.S., Pande, A., Dhule, P.: Canopy clustering: a review on pre-clustering approach to k-means clustering. Int. J. Innov. Adv. Comput. Sci. (IJIACS) 3(5), 22–29 (2014)
Liu, D., Wang, M., Shen, G.: A new combinatorial characteristic parameter for clustering-based traffic network partitioning. IEEE Access 7, 40175–40182 (2019). https://doi.org/10.1109/ACCESS.2019.2905618
McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2000. pp. 169–178. ACM, New York (2000). https://doi.org/10.1145/347090.347123
Moore, A., Hall, J., Kreibich, C., Harris, E., Pratt, I.: Architecture of a network monitor. In: Passive & Active Measurement Workshop, vol. 2003 (2003)
Moore, A., Zuev, D., Crogan, M.: Discriminators for use in flow-based classification (2005)
Sharmila, K.M.: An optimized farthest first clustering algorithm. In: 2013 Nirma University International Conference on Engineering (NUiCONE), pp. 1–5, November 2013. https://doi.org/10.1109/NUiCONE.2013.6780070
Shim, K., Goo, Y., Lee, M., Hasanova, H., Kim, M.: The method of clustering network traffic classifications for extracting payload signature by function. In: 2018 International Conference on Information and Communication Technology Convergence (ICTC), pp. 1335–1337, October 2018. https://doi.org/10.1109/ICTC.2018.8539623
Takyi, K., Bagga, A., Goopta, P.: Clustering techniques for traffic classification: a comprehensive review. In: 2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), pp. 224–230, August 2018. https://doi.org/10.1109/ICRITO.2018.8748772
Tapaswi, S., Gupta, A.S.: Flow-based p2p network traffic classification using machine learning. In: 2013 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, pp. 402–406, October 2013. https://doi.org/10.1109/CyberC.2013.75
Velea, R., Ciobanu, C., Margarit, L., Bica, I.: Network traffic anomaly detection using shallow packet inspection and parallel k-means data clustering. Stud. Inform. Control 26(4), 387–396 (2017)
Wang, Y., Xiang, Y., Zhang, J., Yu, S.: A novel semi-supervised approach for network traffic clustering. In: 2011 5th International Conference on Network and System Security, pp. 169–175, September 2011. https://doi.org/10.1109/ICNSS.2011.6059997
Wang, Y., Xiang, Y., Zhang, J., Zhou, W., Wei, G., Yang, L.T.: Internet traffic classification using constrained clustering. IEEE Trans. Parallel Distrib. Syst. 25(11), 2932–2943 (2014). https://doi.org/10.1109/TPDS.2013.307
Williams, N., Zander, S., Armitage, G.: A preliminary performance comparison of five machine learning algorithms for practical ip traffic flow classification. SIGCOMM Comput. Commun. Rev. 36(5), 5–16 (2006). https://doi.org/10.1145/1163593.1163596
Zhang, J., Chen, X., Xiang, Y., Zhou, W., Wu, J.: Robust network traffic classification. IEEE/ACM Trans. Netw. 23(4), 1257–1270 (2015). https://doi.org/10.1109/TNET.2014.2320577
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, Y., Xue, H., Wei, G., Wu, L., Wang, Y. (2019). A Comparative Study on Network Traffic Clustering. In: Liu, J., Huang, X. (eds) Network and System Security. NSS 2019. Lecture Notes in Computer Science(), vol 11928. Springer, Cham. https://doi.org/10.1007/978-3-030-36938-5_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-36938-5_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36937-8
Online ISBN: 978-3-030-36938-5
eBook Packages: Computer ScienceComputer Science (R0)