Skip to main content
Log in

Seq2Path: a sequence-to-path-based flow feature fusion approach for encrypted traffic classification

  • Published:
Cluster Computing Aims and scope Submit manuscript

This article has been updated

Abstract

With the increasing awareness of user privacy protection and communication security, encrypted traffic has increased dramatically. Usually utilizing the flow information of the traffic, flow statistics-based methods are able to classify encrypted traffic. However, these methods require a large number of packets and manual selection of statistical features. In this paper, we propose a novel encrypted traffic classification method (Seq2Path), which fuses flow features by using path signature theory to translate feature sequences into a traffic path. Then, the statistical features of the traffic path are generated by computing its signature; and finally, these features are used to train a machine learning classifier. Our experiments on four datasets containing three types of traffic (HTTPS, VPN and Tor) show that Seq2Path achieves stable performance and generally outperforms state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The datasets analyzed and code during the current study are available from the corresponding author on reasonable request.

Change history

  • 05 September 2022

    The original online version of this article was revised: The author Jian Weng's missing biography and photo has been added.

References

  1. Tang, Z., Zeng, X., Chen, J., Guo, Z.: A review of network traffic analysis based on machine learning. Netw. New Med. Technol. 9(5), 1–8 (2020)

    Google Scholar 

  2. Dierks, T., Rescorla, E.: The transport layer security (TLS) protocol version 1.2. RFC 5246, 1–104 (2008). https://doi.org/10.17487/RFC5246

    Article  Google Scholar 

  3. Venkateswaran, R.: Virtual private networks. IEEE Potentials 20(1), 11–15 (2001)

    Article  Google Scholar 

  4. Dingledine, R., Mathewson, N., Syverson, P.F.: Tor: the second-generation onion router. Technical Report (2004). http://www.usenix.org/publications/library/proceedings/sec04/tech/dingledine.html

  5. Liu, J., Fu, Y., Ming, J., Ren, Y., Sun, L., Xiong, H.: Effective and real-time in-app activity analysis in encrypted internet traffic streams. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 335–344 (2017). https://doi.org/10.1145/3097983.3098049

  6. Panchenko, A., Lanze, F., Pennekamp, J., Engel, T., Zinnen, A., Henze, M., Wehrle, K.: Website fingerprinting at internet scale. NDSS (2016). http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2017/09/website-fingerprinting-internet-scale.pdf

  7. Chevyrev, I., Kormilitzin, A.: A primer on the signature method in machine learning. Preprint (2016). arXiv:1603.03788

  8. Abe, K., Goto, S.: Fingerprinting attack on tor anonymity using deep learning. Proc. Asia-Pac. Adv. Netw. 42, 15–20 (2016)

    Google Scholar 

  9. Bhat, S., Lu, D., Kwon, A., Devadas, S.: Var-CNN: a data-efficient website fingerprinting attack based on deep learning. Proc. Priv. Enhanc. Technol. 2019(4), 292–310 (2019)

    Google Scholar 

  10. Feghhi, S., Leith, D.J.: A web traffic analysis attack using only timing information. IEEE Trans. Inf. Forensics Secur. 11(8), 1747–1759 (2016). https://doi.org/10.1109/TIFS.2016.2551203

    Article  Google Scholar 

  11. Rahman, M.S., Sirinam, P., Mathews, N., Gangadhara, K.G., Wright, M.: Tik-Tok: the utility of packet timing in website fingerprinting attacks. Proc. Priv. Enhanc. Technol. 2020(3), 5–24 (2020). https://doi.org/10.2478/popets-2020-0043

    Article  Google Scholar 

  12. Taylor, V.F., Spolaor, R., Conti, M., Martinovic, I.: Robust smartphone app identification via encrypted network traffic analysis. IEEE Trans. Inf. Forensics Secur. 13(1), 63–78 (2018). https://doi.org/10.1109/TIFS.2017.2737970

    Article  Google Scholar 

  13. Liu, C., He, L., Xiong, G., Cao, Z., Li, Z.: Fs-net: a flow sequence network for encrypted traffic classification. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, pp. 1171–1179 (2019). https://doi.org/10.1109/INFOCOM.2019.8737507

  14. Shen, M., Liu, Y., Zhu, L., Du, X., Hu, J.: Fine-grained webpage fingerprinting using only packet length information of encrypted traffic. IEEE Trans. Inf. Forensics Secur. 16, 2046–2059 (2021). https://doi.org/10.1109/TIFS.2020.3046876

    Article  Google Scholar 

  15. Fu, Y., Xiong, H., Lu, X., Yang, J., Chen, C.: Service usage classification with encrypted internet traffic in mobile messaging apps. IEEE Trans. Mob. Comput. 15(11), 2851–2864 (2016). https://doi.org/10.1109/TMC.2016.2516020

    Article  Google Scholar 

  16. Wang, Q., Yahyavi, A., Kemme, B., He, W.: I know what you did on your smartphone: inferring app usage over encrypted data traffic. In: 2015 IEEE Conference on Communications and Network Security CNS, pp. 433–441 (2015). https://doi.org/10.1109/CNS.2015.7346855

  17. Yang, Y., Kang, C., Gou, G., Li, Z., Xiong, G.: TLS/SSL encrypted traffic classification with autoencoder and convolutional neural network. In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS, pp. 362–369 (2018). https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00079

  18. Wang, W., Zhu, M., Zeng, X., Ye, X., Sheng, Y.: Malware traffic classification using convolutional neural network for representation learning. In: 2017 International Conference on Information Networking ICOIN, pp. 712–717 (2017). https://doi.org/10.1109/ICOIN.2017.7899588

  19. Marín, G., Caasas, P., Capdehourat, G.: Deepmal-deep learning models for malware traffic detection and classification. In: Data Science—Analytics and Applications, pp. 105–112. Springer, Wiesbaden (2021)

  20. Lotfollahi, M., Jafari Siavoshani, M., Shirali Hossein Zade, R., Saberian, M.: Deep packet: a novel approach for encrypted traffic classification using deep learning. Soft. Comput. 24(3), 1999–2012 (2020)

    Article  Google Scholar 

  21. Yao, H., Liu, C., Zhang, P., Wu, S., Jiang, C., Yu, S.: Identification of encrypted traffic through attention mechanism based long short term memory. IEEE Trans. Big Data 8, 241–252 (2019)

    Article  Google Scholar 

  22. Liu, X., You, J., Wu, Y., Li, T., Li, L., Zhang, Z., Ge, J.: Attention-based bidirectional GRU networks for efficient https traffic classification. Inf. Sci. 541, 297–315 (2020)

    Article  Google Scholar 

  23. Dong, C., Zhang, C., Lu, Z., Liu, B., Jiang, B.: Cetanalytics: comprehensive effective traffic information analytics for encrypted traffic classification. Comput. Netw. 176, 107258 (2020)

    Article  Google Scholar 

  24. Lin, K., Xu, X., Gao, H.: TSCRNN: a novel classification scheme of encrypted traffic based on flow spatiotemporal features for efficient management of iiot. Comput. Netw. 190, 107974 (2021)

    Article  Google Scholar 

  25. Aceto, G., Ciuonzo, D., Montieri, A., Pescapé, A.: DISTILLER: encrypted traffic classification via multimodal multitask deep learning. J. Netw. Comput. Appl. 183, 102985 (2021)

    Article  Google Scholar 

  26. Chen, K.-T.: Integration of paths—a faithful representation of paths by noncommutative formal power series. Trans. Am. Math. Soc. 89(2), 395–407 (1958)

    MathSciNet  Google Scholar 

  27. Kidger, P., Bonnier, P., Arribas, I.P., Salvi, C., Lyons, T.J.: Deep signature transforms. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alch´e-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8–14, 2019, Vancouver, BC, Canada, pp. 3099–3109 (2019). https://proceedings.neurips.cc/paper/2019/hash/d2cdf047a6674cef251d56544a3cf029-Abstract.html

  28. Hambly, B., Lyons, T.: Uniqueness for the signature of a path of bounded variation and the reduced path group. Ann. Math. 171, 109–167 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  29. Graham, B.: Sparse arrays of signatures for online character recognition. CoRR (2013). arXiv:1308.0371

  30. Gyurkó, L.G., Lyons, T., Kontkowski, M., Field, J.: Extracting information from the signature of a financial data stream. Preprint (2013). arXiv:1307.7244

  31. Diggle, P., Heagerty, P., Liang, K.-Y., Zeger, S.: Analysis of longitudinal data. In: Analysis of Longitudinal Data, pp. 379–379 (2013)

  32. Dainotti, A., Pescapè, A., Claffy, K.C.: Issues and future directions in traffic classification. IEEE Netw. 26(1), 35–40 (2012). https://doi.org/10.1109/MNET.2012.6135854

    Article  Google Scholar 

  33. Bartos, K., Sofka, M., Franc, V.: Optimized invariant representation of network traffic for detecting unseen malware variants. In: Holz, T., Savage, S. (eds.) 25th USENIX Security Symposium, pp. 807–822 (2016)

  34. Morrill, J., Fermanian, A., Kidger, P., Lyons, T.: A generalised signature method for multivariate time series feature extraction. Preprint (2020). arXiv:2006.00873

  35. Shbair, W., Cholez, T., Francois, J., Chrisment, I.: HTTPS websites dataset. http://betternet.lhs.loria.fr/datasets/https/

  36. Stratosphere: Stratosphere Laboratory Datasets. Retrieved March 13, 2020, from https://www.stratosphereips.org/datasets-overview (2015)

  37. Draper-Gil, G., Lashkari, A..H., Mamun, M..S..I., Ghorbani, A..A.: Characterization of encrypted and VPN traffic using time-related features. In: Camp, O., Furnell, S., Mori, P. (eds.) Proceedings of the 2nd International Conference on Information Systems Security and Privacy ICISSP, pp. 407–414 (2016). https://doi.org/10.5220/0005740704070414

  38. Lashkari, A..H., Draper-Gil, G., Mamun, M..S..I., Ghorbani, A..A.: Characterization of tor traffic using time based features. In. Mori, P., Furnell, S., Camp, O. (eds.) ICISSP, pp. 253–262 (2017). https://doi.org/10.5220/0006105602530262

  39. Wang, W., Zhu, M., Wang, J., Zeng, X., Yang, Z.: End-to-end encrypted traffic classification with one-dimensional convolution neural networks. In: 2017 IEEE International Conference on Intelligence and Security Informatics ISI, pp. 43–48 (2017). https://doi.org/10.1109/ISI.2017.8004872

Download references

Acknowledgements

The authors would like to thank Wazen Shbair et al. for the public datasets and Patrick Kidger and Terry Lyons for publicly sharing the code of path signature.

Funding

This work was partially supported by NSFC (Grant No. 92067108), Natural Science Foundation of Guangdong Province (Grant No. 2021A1515011314).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by CJ and SX. The first draft of the manuscript was written by CJ and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Guanggang Geng.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, C., Xu, S., Geng, G. et al. Seq2Path: a sequence-to-path-based flow feature fusion approach for encrypted traffic classification. Cluster Comput 26, 1785–1800 (2023). https://doi.org/10.1007/s10586-022-03709-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-022-03709-w

Keywords

Navigation