Abstract
Network traffic classification is an enabling technique for network security and management for both traditional networks and emerging networks such as Internet of Things. Due to the decreasing effectiveness of traditional port-based and payload-based methods, lots of research attentions are devoted to an alternative approach based on flow and packet-level traffic characteristics. A variety of statistical classification schemes are proposed in this context, but most of them embody an implicit assumption that all protocols are known in advance and well presented in the training data. This assumption is unrealistic because real-world networks constantly witness emerging traffic patterns and protocols that are previously unknown. In this paper, we revisit the problem by proposing a learning scheme with unknown pattern extraction for statistical protocol identification. The scheme is designed with a more realistic setting, in which we assume that the training data only consists of labeled samples from a limited number of protocols, and the goal is to identify these known patterns out of arbitrary traffic mixture of both known and unknown protocols. Our experiments based on real-world traffic show that the proposed scheme outperforms previous approaches by accurately identifying both known and unknown protocols.





References
Nguyen TT, Armitage G (2008) A survey of techniques for internet traffic classification using machine learning. Commun Surveys Tuts 10(4):56
Liu Q, Wang G, Liu X, Peng T, Wu J (2017) Achieving reliable and secure services in cloud computing environments. Comput Electr Eng 59:153
Meng W, Tischhauser EW, Wang Q, Wang Y, Han J (2018) When intrusion detection meets blockchain technology: a review. IEEE Access 6:10179
Karagiannis T, Broido A, Brownlee N, Claffy KC, Faloutsos M (2004) In: Global telecommunications conference GLOBECOM ’04. IEEE, vol 3. pp 1532–1538
Sen S, Spatscheck O, Wang D (2004) In: Proceedings of the 13th international conference on World Wide Web, WWW ’04, ACM, New York, pp 512–521
Mawi working group traffic archive. http://mawi.wide.ad.jp/mawi/. Accessed: 2018-03-01
Meng W, Wang Y, Wong DS, Wen S, Xiang Y (2018) Touchwb: Touch behavioral user authentication based on web browsing on smartphones. J Netw Comput Appl 117:1
Li J, Sun L, Yan Q, Li Z, Srisa-an W, Ye H (2018) Significant permission identification for machine learning based android malware detection. IEEE Transactions on Industrial Informatics. https://doi.org/10.1109/TII.2017.2789219
Liu Y, Ling J, Liu Z, Shen J, Gao C (2018) Finger vein secure biometric template generation based on deep learning. Soft Comput 22(7):2257
Meng W, Jiang L, Wang Y, Li J, Zhang J, Xiang Y (2017) Jfcguard: Detecting juice filming charging attack via processor usage analysis on smartphones. Computers & Security. https://doi.org/10.1016/j.cose.2017.11.012
Yuan C, Li X, Wu Q, Li J, Sun X (2017) Fingerprint liveness detection from different fingerprint materials using convolutional neural network and principal component analysis. CMC-Computers Materials & Continua 53(4):357
Roughan M, Sen S, Spatscheck O, Duffield N (2004) In: Proceedings of the 4th ACM SIGCOMM conference on internet measurement, IMC ’04, ACM, New York, pp 135–148
Moore AW, Zuev D (2005) In: Proceedings of the ACM SIGMETRICS international conference on measurement and modeling of computer systems, SIGMETRICS ’05, ACM, New York, pp 50–60
Auld T, Moore AW, Gull SF (2007) Bayesian neural networks for internet traffic classification. IEEE Trans Neural Netw 18(1):223
Chen Z, Peng L, Gao C, Yang B, Chen Y, Li J (2017) Flexible neural trees based early stage identification for ip traffic. Soft Comput 21(8):2035
Williams N, Zander S, Armitage G (2006) A preliminary performance comparison of five machine learning algorithms for practical ip traffic flow classification. SIGCOMM Comput Commun Rev 36(5):5
Kim H, Claffy K, Fomenkov M, Barman D, Faloutsos M, Lee K (2008) In: Proceedings of the ACM coNEXT conference, CoNEXT ’08, ACM, New York, pp 11:1–11:12
Karagiannis T, Papagiannaki K, Faloutsos M (2005) In: Proceedings of the conference on applications, technologies, architectures, and protocols for computer communications, SIGCOMM ’05, ACM, New York, pp 229–240
Jiang W, Wang G, Bhuiyan MZA, Wu J (2016) Understanding graph-based trust evaluation in online social networks: Methodologies and challenges. ACM Comput Surv 49(1):10:1
Yang W, Wang G, Bhuiyan MZA, Choo KKR (2017) Hypergraph partitioning for social networks based on information entropy modularity. J Netw Comput Appl 86:59. Special Issue on Pervasive Social Networking
Peng S, Wang G, Xie D (2017) Social influence analysis in social networking big data: opportunities and challenges. IEEE Netw 31(1):11
Peng S, Yang A, Cao L, Yu S, Xie D (2017) Social influence modeling using information theory in mobile social networks. Inf Sci 379:146
Cai J, Wang Y, Liu Y, Luo JZ, Wei W, Xu X (2017) Enhancing network capacity by weakening community structure in scale-free network. Future Generation Computer Systems. https://doi.org/10.1016/j.future.2017.08.014
Chen S, Wang G, Yan G, Xie D (2017) Multi-dimensional fuzzy trust evaluation for mobile social networks based on dynamic community structures. Concurrency and Computation: Practice and Experience 29(7):e3901
Este A, Gringoli F, Salgarelli L (2009) On the stability of the information carried by traffic flow features at the packet level. SIGCOMM Comput Commun Rev 39(3):13
Pietrzyk M, Costeux JL, Urvoy-Keller G, En-Najjary T (2009) In: Proceedings of the 9th ACM SIGCOMM conference on internet measurement, IMC ’09, ACM, New York, pp 122–135
Lim YS, Kim HC, Jeong J, Kim CK, Kwon TT, Choi Y (2010)
Zander S, Armitage G (2011) In: 2011 IEEE 36th conference on local computer networks, pp 399–406
Amaral P, Dinis J, Pinto P, Bernardo L, Tavares J, Mamede HS (2016) In: 2016 IEEE 24th international conference on network protocols (ICNP), pp 1–5
Crotti M, Dusi M, Gringoli F, Salgarelli L (2007) Traffic classification through simple statistical fingerprinting. SIGCOMM Comput Commun Rev 37(1):5
Este A, Gringoli F, Salgarelli L (2009) Support vector machines for tcp traffic classification. Comput Netw 53(14):2476
Nguyen TTT, Armitage G, Branch P, Zander S (2012) Timely and continuous machine-learning-based classification for interactive ip traffic. IEEE/ACM Trans Netw 20(6):1880
Wang Y, Chen C, Xiang Y (2015) In: 2015 IEEE 40th conference on local computer networks (LCN), pp 506–509
Campos HF, Nobel AB, Smith FD, Jeffay K (2003) In: 35th symposium on the interface of computing science and statistics
McGregor A, Hall M, Lorier P, Brunskill J (2004) . In: Barakat C, Pratt I (eds) Passive and active network measurement. Springer, Berlin, pp 205–214
Zander S, Nguyen T, Armitage G (2005) In: The IEEE conference on local computer networks 30th anniversary (LCN’05)l, pp 250–257
Erman J, Mahanti A, Arlitt M (2006) In: IEEE Globecom 2006, pp 1–6
Bernaille L, Teixeira R, Akodkenou I, Soule A, Salamatian K (2006) Traffic classification on the fly. SIGCOMM Comput Commun Rev 36(2):23
Erman J, Arlitt M, Mahanti A (2006) In: Proceedings of the SIGCOMM workshop on mining network data, MineNet ’06, ACM, New York, pp 281–286
Wang Y, Xiang Y, Zhang J, Zhou W, Wei G, Yang LT (2014) Internet traffic classification using constrained clustering. IEEE Trans Parallel Distrib Syst 25(11):2932
Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C (2007) In: Proceedings of the ACM SIGMETRICS international conference on measurement and modeling of computer systems, SIGMETRICS ’07, ACM, New York, pp 369–370
Li P, Li J, Huang Z, Gao CZ, Chen WB, Chen K (2017) Privacy-preserving outsourced classification in cloud computing. Cluster Computing. https://doi.org/10.1007/s10586-017-0849-9
Gao CZ, Cheng Q, Li X, Xia SB (2018) Cloud-assisted privacy-preserving profile-matching scheme under multiple keys in mobile social network. Cluster Computing. https://doi.org/10.1007/s10586-017-1649-y
Luo E, Liu Q, Abawajy JH, Wang G (2017) Privacy-preserving multi-hop profile-matching protocol for proximity mobile social networks. Futur Gener Comput Syst 68:222
Li P, Li J, Huang Z, Li T, Gao CZ, Yiu SM, Chen K (2017) Multi-key privacy-preserving deep learning in cloud computing. Futur Gener Comput Syst 74:76
Li J, Zhang Y, Chen X, Xiang Y (2018) Secure attribute-based data sharing for resource-limited users in cloud computing. Comput Secur 72:1
zhi Gao C, Cheng Q, He P, Susilo W, Li J (2018) Privacy-preserving naive bayes classifiers secure against the substitution-then-comparison attack. Inf Sci 444:72
A day in the life of the internet (ditl). https://www.caida.org/projects/ditl/. Accessed: 2018-03-01
Tcp statistic and analysis tool. http://tstat.polito.it/. Accessed: 2018-03-01
Wireshark. https://www.wireshark.org/. Accessed: 2018-03-01
Libsvm – a library for support vector machines. https://www.csie.ntu.edu.tw/∼cjlin/libsvm/. Accessed: 2018-03-01
Acknowledgements
The work is supported by NSFC Project 61802080 and 61872102.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, Y., Xue, H., Liu, Y. et al. Statistical network protocol identification with unknown pattern extraction. Ann. Telecommun. 74, 473–482 (2019). https://doi.org/10.1007/s12243-019-00704-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12243-019-00704-y