Skip to main content
Log in

Feature Engineering and Ensemble Learning-Based Classification of VPN and Non-VPN-Based Network Traffic over Temporal Features

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

With the rapid advancement in technology, the constant emergence of new applications and services has resulted in a drastic increase in Internet traffic, making it increasingly challenging for network analysts to maintain network security and classify traffic, especially when encrypted or tunneled. To address this issue, the proposed strategy aims to distinguish between regular traffic and traffic tunneled through a virtual private network and characterize traffic from seven different applications. The proposed approach utilizes various ensemble machine learning techniques, which are efficient and accurate and consume minimal computational time for training and prediction compared to conventional machine and deep learning models. These models were applied for both the classification and characterization of network traffic, deriving efficient results. The extreme and light gradient boosting algorithms performed well in multiclass classification, while AdaBoost and Light GBM performed well in binary classification. However, when all the datasets were merged and categorized into two classes and various feature engineering methods were applied, the proposed system achieved an accuracy of more than 99%, with minimal error scores using light GBM with min–max scaling over stratified fivefold, thereby outperforming all existing approaches. This research highlights the efficiency and potential of the proposed model in detecting network traffic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Schneider P. TCP/IP traffic classification based on port numbers. Cambridge: Division of Applied Sciences; 1996.

    Google Scholar 

  2. Aiyanyo ID, Samuel H, Lim H. A systematic review of defensive and offensive cybersecurity with machine learning. Appl Sci. 2020;10(17):5811.

    Article  Google Scholar 

  3. Bagui S, Fang X, Kalaimannan E, Bagui SC, Sheehan J. Comparison of machine-learning algorithms for classification of VPN network traffic flow using time-related features. J Cyber Secur Technol. 2017;1(2):108–26.

    Article  Google Scholar 

  4. Ben-Hur A, Horn D, Siegelmann HT, Vapnik V. Support vector clustering. J Mach Learn Res. 2001;2(Dec):125–37.

    MATH  Google Scholar 

  5. Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci. 1997;55:119–39.

    Article  MathSciNet  MATH  Google Scholar 

  6. Farooq U. Real time password strength analysis on a web application using multiple machine learning approaches. Int J Eng Res Technol (IJERT). 2020;9(12):359–64.

    Google Scholar 

  7. Gupta A. VPN non-VPN traffic classification using deep reinforced naive bayes and fuzzy K-means clustering. In: 2021 IEEE 41st international conference on distributed computing systems workshops (ICDCSW). IEEE; 2021. pp. 1–6.

  8. Arndt DJ, Zincir-Heywood AN. A comparison of three machine learning techniques for encrypted network traffic analysis. In: 2011 IEEE symposium on computational intelligence for security and defense applications (CISDA); 2011. pp. 107–14.

  9. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD Explor Newsl. 2009;11(1):10–8.

    Article  Google Scholar 

  10. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Liu T-Y. Light GBM: a highly efficient gradient boosting decision tree. In: 31st conference on neural information processing systems (NIPS 2017), Long Beach, CA, USA; 2017.

  11. Shapira T, Shavitt Y. Flowpic: encrypted internet traffic classification is as easy as image recognition. In: IEEE INFOCOM 2019-IEEE conference on computer communications workshops (INFOCOM WKSHPS). IEEE; 2019. pp. 680–7.

  12. Miller S, Curran K, Lunney T. Traffic classification for the detection of anonymous web proxy routing. Int J Inf Secur Res. 2015;5(1):538–45.

    Google Scholar 

  13. Dainotti A, Pescapé A, Ventre G. A packet-level characterization of network traffic. In: 2006 11th international workshop on computer-aided modeling, analysis and design of communication links and networks. IEEE; 2006. pp. 38–45.

  14. Gómez Sena G, BelzarenaP. Early traffic classification using support vector machines. In: Proceedings of the 5th international latin american networking conference; 2009. pp. 60–6.

  15. Aceto G, Dainotti A, De Donato W, Pescapé A. Port load: taking the best of two worlds in traffic classification. In: 2010 INFOCOM IEEE conference on computer communications workshops. IEEE; 2010. pp. 1–5.

  16. Draper-Gil G, Lashkari AH, Mamun MSI, Ghorbani AA. Characterization of encrypted and vpn traffic using time-related features. In: Proceedings of the 2nd international conference on information systems security and privacy; 2016.

  17. Chen T, Guestrin C. Xgboost: a scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM; 2016. pp. 785–94.

  18. Coull SE, Dyer KP. Traffic analysis of encrypted messaging services: apple imessage and beyond. ACM SIGCOMM Comput Commun Rev. 2014;44(5):5–11 (networks ETRI journal, 42(3), 311–323).

    Article  Google Scholar 

  19. Nguyen TT, Armitage G. A survey of techniques for internet traffic classification using machine learning. IEEE Commun Surv Tutor. 2008;10(4):56–76.

    Article  Google Scholar 

  20. Ismailaj K, Camelo M, Latré S. When deep learning may not be the right tool for traffic classification. In: 2021 IFIP/IEEE international symposium on integrated network management (IM). IEEE; 2021. pp. 884–9.

  21. Miller S, Curran K, Lunney T. Detection of virtual private network traffic using machine learning. Int J Wirel Netw Broadband Technol (IJWNBT). 2020;9(2):60–80.

    Article  Google Scholar 

  22. Singh P, Singh P, Farooq U, Khurana SS, Verma JK, Kumar M. CottonLeafNet: cotton plant leaf disease detection using deep neural networks. Multimed Tools Appl. 2023:1–26. https://doi.org/10.1007/s11042-023-14954-5.

  23. Rezaei S, Liu X. Deep learning for encrypted traffic classification: an overview. IEEE Commun Mag. 2019;57(5):76–81.

    Article  Google Scholar 

  24. Zain ul Abideen M, Saleem S, Ejaz M. VPN traffic detection in SSL-protected channel. Sec Commun Netw. 2019;2019:1–17. https://doi.org/10.1155/2019/7924690.

  25. Lotfollahi M, Siavoshani MJ, Zade RSH, Saberian M. Deep packet: a novel approach for encrypted traffic classification using deep learning. Soft Comput. 2020;24(3):1999–2012.

    Article  Google Scholar 

  26. Zhou K, Wang W, Wu C, Hu T. Practical evaluation of encrypted traffic classification based on a combined method of entropy estimation and neural; 2020.

  27. Bu Z, Zhou B, Cheng P, Zhang K, Ling ZH. Encrypted network traffic classification using deep and parallel network-in-network models. IEEE Access. 2020;8:132950–9.

    Article  Google Scholar 

  28. Majeed U, Khan LU, Hong CS. (2020) Cross-silo horizontal federated learning for flow-based time-related-features oriented traffic classification. In: 2020 21st Asia-Pacific network operations and management symposium (APNOMS); 2020. p. 38.

  29. Farooq U. Ensemble machine learning approaches for detection of SQL injection attack. Tehnički glasnik. 2021;15(1):112–20.

    Article  Google Scholar 

Download references

Funding

The authors declare that they received no grant to support this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Parvinder Singh.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Research Trends in Communication and Network Technologies” guest edited by Anshul Verma, Pradeepika Verma and Kiran Kumar Pattanaik.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abbas, G., Farooq, U., Singh, P. et al. Feature Engineering and Ensemble Learning-Based Classification of VPN and Non-VPN-Based Network Traffic over Temporal Features. SN COMPUT. SCI. 4, 546 (2023). https://doi.org/10.1007/s42979-023-01944-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-01944-5

Keywords

Navigation