Abstract
With the rapid advancement in technology, the constant emergence of new applications and services has resulted in a drastic increase in Internet traffic, making it increasingly challenging for network analysts to maintain network security and classify traffic, especially when encrypted or tunneled. To address this issue, the proposed strategy aims to distinguish between regular traffic and traffic tunneled through a virtual private network and characterize traffic from seven different applications. The proposed approach utilizes various ensemble machine learning techniques, which are efficient and accurate and consume minimal computational time for training and prediction compared to conventional machine and deep learning models. These models were applied for both the classification and characterization of network traffic, deriving efficient results. The extreme and light gradient boosting algorithms performed well in multiclass classification, while AdaBoost and Light GBM performed well in binary classification. However, when all the datasets were merged and categorized into two classes and various feature engineering methods were applied, the proposed system achieved an accuracy of more than 99%, with minimal error scores using light GBM with min–max scaling over stratified fivefold, thereby outperforming all existing approaches. This research highlights the efficiency and potential of the proposed model in detecting network traffic.
Similar content being viewed by others
References
Schneider P. TCP/IP traffic classification based on port numbers. Cambridge: Division of Applied Sciences; 1996.
Aiyanyo ID, Samuel H, Lim H. A systematic review of defensive and offensive cybersecurity with machine learning. Appl Sci. 2020;10(17):5811.
Bagui S, Fang X, Kalaimannan E, Bagui SC, Sheehan J. Comparison of machine-learning algorithms for classification of VPN network traffic flow using time-related features. J Cyber Secur Technol. 2017;1(2):108–26.
Ben-Hur A, Horn D, Siegelmann HT, Vapnik V. Support vector clustering. J Mach Learn Res. 2001;2(Dec):125–37.
Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci. 1997;55:119–39.
Farooq U. Real time password strength analysis on a web application using multiple machine learning approaches. Int J Eng Res Technol (IJERT). 2020;9(12):359–64.
Gupta A. VPN non-VPN traffic classification using deep reinforced naive bayes and fuzzy K-means clustering. In: 2021 IEEE 41st international conference on distributed computing systems workshops (ICDCSW). IEEE; 2021. pp. 1–6.
Arndt DJ, Zincir-Heywood AN. A comparison of three machine learning techniques for encrypted network traffic analysis. In: 2011 IEEE symposium on computational intelligence for security and defense applications (CISDA); 2011. pp. 107–14.
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD Explor Newsl. 2009;11(1):10–8.
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Liu T-Y. Light GBM: a highly efficient gradient boosting decision tree. In: 31st conference on neural information processing systems (NIPS 2017), Long Beach, CA, USA; 2017.
Shapira T, Shavitt Y. Flowpic: encrypted internet traffic classification is as easy as image recognition. In: IEEE INFOCOM 2019-IEEE conference on computer communications workshops (INFOCOM WKSHPS). IEEE; 2019. pp. 680–7.
Miller S, Curran K, Lunney T. Traffic classification for the detection of anonymous web proxy routing. Int J Inf Secur Res. 2015;5(1):538–45.
Dainotti A, Pescapé A, Ventre G. A packet-level characterization of network traffic. In: 2006 11th international workshop on computer-aided modeling, analysis and design of communication links and networks. IEEE; 2006. pp. 38–45.
Gómez Sena G, BelzarenaP. Early traffic classification using support vector machines. In: Proceedings of the 5th international latin american networking conference; 2009. pp. 60–6.
Aceto G, Dainotti A, De Donato W, Pescapé A. Port load: taking the best of two worlds in traffic classification. In: 2010 INFOCOM IEEE conference on computer communications workshops. IEEE; 2010. pp. 1–5.
Draper-Gil G, Lashkari AH, Mamun MSI, Ghorbani AA. Characterization of encrypted and vpn traffic using time-related features. In: Proceedings of the 2nd international conference on information systems security and privacy; 2016.
Chen T, Guestrin C. Xgboost: a scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM; 2016. pp. 785–94.
Coull SE, Dyer KP. Traffic analysis of encrypted messaging services: apple imessage and beyond. ACM SIGCOMM Comput Commun Rev. 2014;44(5):5–11 (networks ETRI journal, 42(3), 311–323).
Nguyen TT, Armitage G. A survey of techniques for internet traffic classification using machine learning. IEEE Commun Surv Tutor. 2008;10(4):56–76.
Ismailaj K, Camelo M, Latré S. When deep learning may not be the right tool for traffic classification. In: 2021 IFIP/IEEE international symposium on integrated network management (IM). IEEE; 2021. pp. 884–9.
Miller S, Curran K, Lunney T. Detection of virtual private network traffic using machine learning. Int J Wirel Netw Broadband Technol (IJWNBT). 2020;9(2):60–80.
Singh P, Singh P, Farooq U, Khurana SS, Verma JK, Kumar M. CottonLeafNet: cotton plant leaf disease detection using deep neural networks. Multimed Tools Appl. 2023:1–26. https://doi.org/10.1007/s11042-023-14954-5.
Rezaei S, Liu X. Deep learning for encrypted traffic classification: an overview. IEEE Commun Mag. 2019;57(5):76–81.
Zain ul Abideen M, Saleem S, Ejaz M. VPN traffic detection in SSL-protected channel. Sec Commun Netw. 2019;2019:1–17. https://doi.org/10.1155/2019/7924690.
Lotfollahi M, Siavoshani MJ, Zade RSH, Saberian M. Deep packet: a novel approach for encrypted traffic classification using deep learning. Soft Comput. 2020;24(3):1999–2012.
Zhou K, Wang W, Wu C, Hu T. Practical evaluation of encrypted traffic classification based on a combined method of entropy estimation and neural; 2020.
Bu Z, Zhou B, Cheng P, Zhang K, Ling ZH. Encrypted network traffic classification using deep and parallel network-in-network models. IEEE Access. 2020;8:132950–9.
Majeed U, Khan LU, Hong CS. (2020) Cross-silo horizontal federated learning for flow-based time-related-features oriented traffic classification. In: 2020 21st Asia-Pacific network operations and management symposium (APNOMS); 2020. p. 38.
Farooq U. Ensemble machine learning approaches for detection of SQL injection attack. Tehnički glasnik. 2021;15(1):112–20.
Funding
The authors declare that they received no grant to support this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Research Trends in Communication and Network Technologies” guest edited by Anshul Verma, Pradeepika Verma and Kiran Kumar Pattanaik.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Abbas, G., Farooq, U., Singh, P. et al. Feature Engineering and Ensemble Learning-Based Classification of VPN and Non-VPN-Based Network Traffic over Temporal Features. SN COMPUT. SCI. 4, 546 (2023). https://doi.org/10.1007/s42979-023-01944-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-023-01944-5