Skip to main content
Log in

Machine Learning Based Classification Accuracy of Encrypted Service Channels: Analysis of Various Factors

  • Published:
Journal of Network and Systems Management Aims and scope Submit manuscript

Abstract

Visibility into network traffic is a key requirement for different security and network monitoring tools. Recent trends in the evolution of Internet traffic present a challenge for traditional traffic analysis methods to achieve accurate classification of Internet traffic including Voice over IP (VoIP), text messaging, video, and audio services among others. A key aspect of this trend is the rising levels of encrypted multiple service channels where the payload is opaque to middleboxes in the network. In such scenarios, traditional approaches such as Deep Packet Inspection (DPI) or examination of Port numbers are unable to achieve the classification accuracy required. This work investigates Machine Learning-based network traffic classifiers as a means of accurately classifying encrypted multiple service channels. The study carries out a thorough study which (i) proposes and evaluates two machine learning-based frameworks for multiple service channels analysis; (ii) undertakes feature engineering to identify the minimum number of features required to obtain high accuracy while reducing the effects of over-fitting; (iii) explores the portability and robustness of the frameworks trained models under different network conditions: location, time, and volume; and (iv) collects and analyzes a large-scale dataset including nine classes of services, for benchmarking purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. We use the abbreviation of classifier in the Table 7 for the sake of brevity. We consider Random forest (RF), Decision tree (DT), Complement Naive Bayes (CNB), Multinomial Naive Bayes (MNB), K-Nearest Neighbors (KNN), Bernoulli Naive Bayes (BNB), Linear Support-Vector Machine (LSVM), Classifier using Ridge Regression (RR), NearestCentroid (NC), Support-Vector Machine (SVM), Passive-Aggressive (PR), Perceptron (P), and linear Models with Stochastic Gradient Descent (LSGD).

  2. https://scikit-learn.org/stable/index.html.

  3. The detailed result is omitted for the sake of brevity.

References

  1. Alshammari, R., Zincir-Heywood, A.N.: Can encrypted traffic be identified without port numbers, ip addresses and payload inspection? Comput. Netw. 55, 1326–1350 (2011)

    Article  Google Scholar 

  2. Matthews, P., Rosenberg, J., Wing, D., Mahy, R.: Session traversal utilities for NAT (STUN). In: RFC 5389, no. 5389 in Request for Comments, RFC Editor, (October 2008)

  3. Safari Khatouni, A., Trevisan, M., Regano, L., Viticchié, A.: Privacy issues of ISPs in the modern web. In: 2017 8th IEEE annual information technology, electronics and mobile communication conference (IEMCON), pp. 588–594, (October 2017)

  4. Trevisan, M., Giordano, D., Drago, I., Mellia, M., Munafo, M.: Five years at the edge: Watching internet from the isp network. In: Proceedings of the 14th international conference on emerging networking experiments and technologies, CoNEXT ’18, (New York, NY, USA), pp. 1–12, ACM, (December 2018)

  5. Trevisan, M., Giordano, D., Drago, I., Mellia, M., Munafo, M.: Five years at the edge: Watching internet from the isp network. In: Proceedings of the 14th international conference on emerging networking experiments and technologies, CoNEXT ’18, (New York, NY, USA), pp. 1–12, ACM, (December 2018)

  6. Rescorla, E., Schiffman, A.M.: The secure HyperText transfer protocol. RFC 2660, (Aug. 1999)

  7. Burschka, S., Dupasquier, B.: Tranalyzer: Versatile high performance network traffic analyser. In: 2016 IEEE symposium series on computational intelligence (SSCI), pp. 1–8, (December 2016)

  8. Finamore, A., Mellia, M., Meo, M., Munafo, M.M., Torino, P.D., Rossi, D.: Experiences of Internet traffic monitoring with tstat. IEEE Netw. 25, 8–14 (2011)

    Article  Google Scholar 

  9. CERT/NetSA at Carnegie Mellon University: SiLK (System for Internet-Level Knowledge). http://tools.netsa.cert.org/silk. Accessed July 2009.

  10. C.M. University: Argus: the network audit record generation and utilization system. https://qosient.com/argus/. December 1994

  11. Safari Khatouni, A., Zincir-Heywood, N.: Integrating machine learning with off-the-shelf traffic flow features for http/https traffic classification. In: 2019 the 24th symposium on computers and communications (ISCC), (June 2019)

  12. Safari Khatouni A., Zincir-Heywood, N.: How much training data is enough to move a ml-based classifier to a different network? Procedia Computer Science, 155, 378–385 (August 2019). In: The 14th international conference on future networks and communications (FNC-2019)

  13. Aceto, G., Ciuonzo, D., Montieri, A., Pescapé, A.: Multi-classification approaches for classifying mobile app traffic. J. Netw. Comput. Appl. 103, 131–145 (2018)

    Article  Google Scholar 

  14. Aceto, G., Ciuonzo, D., Montieri, A., Pescapé, A.: Mimetic: mobile encrypted traffic classification using multimodal deep learning. Comput. Netw. 165, 106944 (2019)

    Article  Google Scholar 

  15. Brissaud, P.-O., Francois, J., Chrisment, I., Cholez, T., Bettan, O.: Passive monitoring of https service use. In: CNSM’18—14th international conference on network and service management, (Rome, Italy), pp. 7, (November 2018)

  16. Lotfollahi, M., Zade, R.S.H., Siavoshani, M.J., Saberian, M.: Deep packet: a novel approach for encrypted traffic classification using deep learning. CoRR, (September 2017). arXiv:1709.02656

  17. Trevisan, M., Drago, I., Mellia, M., Song, H.H., Baldi, M.: What: a big data approach for accounting of modern web services. In: 2016 IEEE international conference on big data (Big Data), pp. 2740–2745, (December 2016)

  18. Trevisan, M., Drago, I., Mellia, M., Song, H.H., Baldi, M.: AWESoME: big data for automatic web service management in SDN. IEEE Trans. Netw. Serv. Manag. PP, 1 (2017)

    Google Scholar 

  19. Dong Ning, Y., Jie Zhao, J., Jin, J.: Novel feature selection and classification of internet video traffic based on a hierarchical scheme. Comput. Netw. 119, 102–111 (2017)

    Article  Google Scholar 

  20. Davis, J.J., Foo, E.: Automated feature engineering for http tunnel detection. Comput. Secur. 59, 166–185 (2016)

    Article  Google Scholar 

  21. Gonzalez, R., Soriente, C., Laoutaris, N.: User profiling in the time of HTTPS. In: Proceedings of the 2016 internet measurement conference, IMC ’16, (New York, NY, USA), pp. 373–379, ACM, (November 2016)

  22. Fu, Y., Xiong, H., Lu, X., Yang, J., Chen, C.: Service usage classification with encrypted internet traffic in mobile messaging apps. IEEE Trans. Mob. Comput. 15, 2851–2864 (2016)

    Article  Google Scholar 

  23. Shbair, W.M., Cholez, T., Francois, J., Chrisment, I.: A multi-level framework to identify https services. In: NOMS 2016 - 2016 IEEE/IFIP network operations and management symposium, pp. 240–248, (April 2016)

  24. Taylor, V.F., Spolaor, R., Conti, M., Martinovic, I.: AppScanner: automatic fingerprinting of smartphone apps from encrypted network traffic. In: 2016 IEEE European symposium on security and privacy (EuroS P), pp. 439–454, (March 2016)

  25. Alshammari, R., Zincir-Heywood, A.N.: How robust can a machine learning approach be for classifying encrypted voip? J. Netw. Syst. Manag. 23, 830–869 (2015)

    Article  Google Scholar 

  26. Wang, Q., Yahyavi, A., Kemme, B., He, W.: I know what you did on your smartphone: inferring app usage over encrypted data traffic. In: 2015 IEEE conference on communications and network security (CNS), pp. 433–441, (September 2015)

  27. Xu, Q., Liao, Y., Miskovic, S., Mao, Z.M., Baldi, M., Nucci, A., Andrews, T.: Automatic generation of mobile app signatures from traffic observations. In: 2015 IEEE conference on computer communications (INFOCOM), pp. 1481–1489, (April 2015)

  28. Branch, P.A., Heyde, A., Armitage, G.J.: Rapid identification of skype traffic flows. In: Proceedings of the 18th international workshop on network and operating systems support for digital audio and video, NOSSDAV ’09, (New York, NY, USA), pp. 91–96, ACM, June (2009)

  29. Li, W., Canini, M., Moore, A.W., Bolla, R.: Efficient application identification and the temporal and spatial stability of classification schema. Comput. Netw. 53, 790–809 (2009)

    Article  Google Scholar 

  30. Bernaille, L., Teixeira, R., Akodkenou, I., Soule, A., Salamatian, K.: Traffic classification on the fly. SIGCOMM Comput. Commun. Rev. 36, 23–26 (2006)

    Article  Google Scholar 

  31. Pacheco, F., Exposito, E., Gineste, M., Baudoin, C., Aguilar, J.: Towards the deployment of machine learning solutions in network traffic classification: a systematic survey. IEEE Communications Surveys Tutorials, pp. 1–1, (November 2018)

  32. Namdev, N., Agrawal, S., Silkari, S.: Recent advancement in machine learning based internet traffic classification. Procedia Computer Science 60, 784–791 (2015). Knowledge-based and intelligent information & engineering systems 19th annual conference, KES-2015, Singapore, September 2015 proceedings

  33. Velan, P., Čermák, M., Čeleda, P., Drašar, M.: A survey of methods for encrypted traffic classification and analysis. Network 25, 355–374 (2015)

    Google Scholar 

  34. Nguyen, T.T.T., Armitage, G.: A survey of techniques for internet traffic classification using machine learning. IEEE Commun. Surv. Tutorials 10, 56–76 (2008). Fourth Quarter

    Article  Google Scholar 

  35. Aceto, G., Ciuonzo, D., Montieri, A., Pescapé, A.: Mobile encrypted traffic classification using deep learning: Experimental evaluation, lessons learned, and challenges. IEEE Trans. Netw. Serv. Manag. 16(2), 445–458 (2019)

    Article  Google Scholar 

  36. Shbair, W.M., Cholez, T., Francois, J., Chrisment, I.: A survey of https traffic and services identification approaches (2020)

  37. Conti, M., Li, Q.Q., Maragno, A., Spolaor, R.: The dark side(-channel) of mobile devices: a survey on network traffic analysis. IEEE Commun. Surv. Tutorials 20(4), 2658–2713 (2018)

    Article  Google Scholar 

  38. Kim, H., Claffy, K., Fomenkov, M., Barman, D., Faloutsos, M., Lee, K.: Internet traffic classification demystified: myths, caveats, and the best practices. In: Proceedings of the 2008 ACM CoNEXT conference, CoNEXT ’08, (New York, NY, USA), Association for Computing Machinery, (2008)

  39. Datta, J., Kataria N., Hubballi, N.: Network traffic classification in encrypted environment: A case study of google hangout. In: 2015 twenty first national conference on communications (NCC), pp. 1–6, February 2015

  40. Husák, M., Cermák, M., Jirsík, T., Celeda, P.: Network-based https client identification using ssl/tls fingerprinting. In: 2015 10th international conference on availability, reliability and security, pp. 389–396, (August 2015)

  41. Hady, M.F.A., Schwenker, F.: Semi-supervised learning, pp. 215–239. Springer, Berlin, Heidelberg (2013)

    Google Scholar 

  42. Kato, N., Fadlullah, Z.M., Mao, B., Tang, F., Akashi, O., Inoue, T., Mizutani, K.: The deep learning vision for heterogeneous network traffic control: proposal, challenges, and future perspective. IEEE Wirel. Commun. 24(3), 146–153 (2017)

    Article  Google Scholar 

  43. Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters (09 2017)

  44. Haddadi, F., Zincir-Heywood, A.N.: Benchmarking the effect of flow exporters and protocol filters on botnet traffic classification. IEEE Syst. J. 10, 1390–1401 (2016)

    Article  Google Scholar 

  45. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  46. Draper-Gil, G., Lashkari, A.H., Mamun, M.S.I., Ghorbani, A.A.: Characterization of encrypted and vpn traffic using time-related features. In: ICISSP, February (2016)

  47. Bonfiglio, D., Mellia, M., Meo, M., Rossi, D., Tofanelli, P.: Revealing skype traffic: when randomness plays with you. In: Proceedings of the 2007 conference on applications, technologies, architectures, and protocols for computer communications, SIGCOMM ’07, (New York, NY, USA), p. 37–48, association for computing machinery, (2007)

  48. Dal Pozzolo, A., Caelen, O., Bontempi, G.: When is undersampling effective in unbalanced classification tasks? In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 200–215. Springer International Publishing, Cham (2015)

    Chapter  Google Scholar 

  49. Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Netw 13(4–5), 411–30 (2000)

    Article  Google Scholar 

  50. Vabalas, A., Gowen, E., Poliakoff, E., Casson, A.J.: Machine learning algorithm validation with a limited sample size. PLoS ONE 14, 1–20 (2019)

    Article  Google Scholar 

  51. Jaber, M., Cascella, R.G., Barakat, C.: Can we trust the inter-packet time for traffic classification? In: 2011 IEEE international conference on communications (ICC), pp. 1–5, (June 2011)

  52. Bar Yanai, R., Langberg, M., Peleg, D., Roditty, L.: Realtime classification for encrypted traffic. In: Festa, P. (ed.) Experimental Algorithms, pp. 373–385. Springer, Berlin (2010)

    Chapter  Google Scholar 

  53. Lotfollahi, M., Zade, R.S.H., Siavoshani, M.J., Saberian, M.: Deep packet: a novel approach for encrypted traffic classification using deep learning. Soft Comput. 24, 1999–2012 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by the Mitacs (IT11704) and Solana Networks funding program. The research is conducted as part of the Dalhousie NIMS Lab at: https://projects.cs.dal.ca/projectx/.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Safari Khatouni.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Safari Khatouni, A., Seddigh, N., Nandy, B. et al. Machine Learning Based Classification Accuracy of Encrypted Service Channels: Analysis of Various Factors. J Netw Syst Manage 29, 8 (2021). https://doi.org/10.1007/s10922-020-09566-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10922-020-09566-5

Keywords

Navigation