Abstract
With the increasing awareness of user privacy protection and communication security, encrypted traffic has increased dramatically. Usually utilizing the flow information of the traffic, flow statistics-based methods are able to classify encrypted traffic. However, these methods require a large number of packets and manual selection of statistical features. In this paper, we propose a novel encrypted traffic classification method (Seq2Path), which fuses flow features by using path signature theory to translate feature sequences into a traffic path. Then, the statistical features of the traffic path are generated by computing its signature; and finally, these features are used to train a machine learning classifier. Our experiments on four datasets containing three types of traffic (HTTPS, VPN and Tor) show that Seq2Path achieves stable performance and generally outperforms state-of-the-art methods.
Similar content being viewed by others
Data availability
The datasets analyzed and code during the current study are available from the corresponding author on reasonable request.
Change history
05 September 2022
The original online version of this article was revised: The author Jian Weng's missing biography and photo has been added.
References
Tang, Z., Zeng, X., Chen, J., Guo, Z.: A review of network traffic analysis based on machine learning. Netw. New Med. Technol. 9(5), 1–8 (2020)
Dierks, T., Rescorla, E.: The transport layer security (TLS) protocol version 1.2. RFC 5246, 1–104 (2008). https://doi.org/10.17487/RFC5246
Venkateswaran, R.: Virtual private networks. IEEE Potentials 20(1), 11–15 (2001)
Dingledine, R., Mathewson, N., Syverson, P.F.: Tor: the second-generation onion router. Technical Report (2004). http://www.usenix.org/publications/library/proceedings/sec04/tech/dingledine.html
Liu, J., Fu, Y., Ming, J., Ren, Y., Sun, L., Xiong, H.: Effective and real-time in-app activity analysis in encrypted internet traffic streams. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 335–344 (2017). https://doi.org/10.1145/3097983.3098049
Panchenko, A., Lanze, F., Pennekamp, J., Engel, T., Zinnen, A., Henze, M., Wehrle, K.: Website fingerprinting at internet scale. NDSS (2016). http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2017/09/website-fingerprinting-internet-scale.pdf
Chevyrev, I., Kormilitzin, A.: A primer on the signature method in machine learning. Preprint (2016). arXiv:1603.03788
Abe, K., Goto, S.: Fingerprinting attack on tor anonymity using deep learning. Proc. Asia-Pac. Adv. Netw. 42, 15–20 (2016)
Bhat, S., Lu, D., Kwon, A., Devadas, S.: Var-CNN: a data-efficient website fingerprinting attack based on deep learning. Proc. Priv. Enhanc. Technol. 2019(4), 292–310 (2019)
Feghhi, S., Leith, D.J.: A web traffic analysis attack using only timing information. IEEE Trans. Inf. Forensics Secur. 11(8), 1747–1759 (2016). https://doi.org/10.1109/TIFS.2016.2551203
Rahman, M.S., Sirinam, P., Mathews, N., Gangadhara, K.G., Wright, M.: Tik-Tok: the utility of packet timing in website fingerprinting attacks. Proc. Priv. Enhanc. Technol. 2020(3), 5–24 (2020). https://doi.org/10.2478/popets-2020-0043
Taylor, V.F., Spolaor, R., Conti, M., Martinovic, I.: Robust smartphone app identification via encrypted network traffic analysis. IEEE Trans. Inf. Forensics Secur. 13(1), 63–78 (2018). https://doi.org/10.1109/TIFS.2017.2737970
Liu, C., He, L., Xiong, G., Cao, Z., Li, Z.: Fs-net: a flow sequence network for encrypted traffic classification. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, pp. 1171–1179 (2019). https://doi.org/10.1109/INFOCOM.2019.8737507
Shen, M., Liu, Y., Zhu, L., Du, X., Hu, J.: Fine-grained webpage fingerprinting using only packet length information of encrypted traffic. IEEE Trans. Inf. Forensics Secur. 16, 2046–2059 (2021). https://doi.org/10.1109/TIFS.2020.3046876
Fu, Y., Xiong, H., Lu, X., Yang, J., Chen, C.: Service usage classification with encrypted internet traffic in mobile messaging apps. IEEE Trans. Mob. Comput. 15(11), 2851–2864 (2016). https://doi.org/10.1109/TMC.2016.2516020
Wang, Q., Yahyavi, A., Kemme, B., He, W.: I know what you did on your smartphone: inferring app usage over encrypted data traffic. In: 2015 IEEE Conference on Communications and Network Security CNS, pp. 433–441 (2015). https://doi.org/10.1109/CNS.2015.7346855
Yang, Y., Kang, C., Gou, G., Li, Z., Xiong, G.: TLS/SSL encrypted traffic classification with autoencoder and convolutional neural network. In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS, pp. 362–369 (2018). https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00079
Wang, W., Zhu, M., Zeng, X., Ye, X., Sheng, Y.: Malware traffic classification using convolutional neural network for representation learning. In: 2017 International Conference on Information Networking ICOIN, pp. 712–717 (2017). https://doi.org/10.1109/ICOIN.2017.7899588
Marín, G., Caasas, P., Capdehourat, G.: Deepmal-deep learning models for malware traffic detection and classification. In: Data Science—Analytics and Applications, pp. 105–112. Springer, Wiesbaden (2021)
Lotfollahi, M., Jafari Siavoshani, M., Shirali Hossein Zade, R., Saberian, M.: Deep packet: a novel approach for encrypted traffic classification using deep learning. Soft. Comput. 24(3), 1999–2012 (2020)
Yao, H., Liu, C., Zhang, P., Wu, S., Jiang, C., Yu, S.: Identification of encrypted traffic through attention mechanism based long short term memory. IEEE Trans. Big Data 8, 241–252 (2019)
Liu, X., You, J., Wu, Y., Li, T., Li, L., Zhang, Z., Ge, J.: Attention-based bidirectional GRU networks for efficient https traffic classification. Inf. Sci. 541, 297–315 (2020)
Dong, C., Zhang, C., Lu, Z., Liu, B., Jiang, B.: Cetanalytics: comprehensive effective traffic information analytics for encrypted traffic classification. Comput. Netw. 176, 107258 (2020)
Lin, K., Xu, X., Gao, H.: TSCRNN: a novel classification scheme of encrypted traffic based on flow spatiotemporal features for efficient management of iiot. Comput. Netw. 190, 107974 (2021)
Aceto, G., Ciuonzo, D., Montieri, A., Pescapé, A.: DISTILLER: encrypted traffic classification via multimodal multitask deep learning. J. Netw. Comput. Appl. 183, 102985 (2021)
Chen, K.-T.: Integration of paths—a faithful representation of paths by noncommutative formal power series. Trans. Am. Math. Soc. 89(2), 395–407 (1958)
Kidger, P., Bonnier, P., Arribas, I.P., Salvi, C., Lyons, T.J.: Deep signature transforms. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alch´e-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8–14, 2019, Vancouver, BC, Canada, pp. 3099–3109 (2019). https://proceedings.neurips.cc/paper/2019/hash/d2cdf047a6674cef251d56544a3cf029-Abstract.html
Hambly, B., Lyons, T.: Uniqueness for the signature of a path of bounded variation and the reduced path group. Ann. Math. 171, 109–167 (2010)
Graham, B.: Sparse arrays of signatures for online character recognition. CoRR (2013). arXiv:1308.0371
Gyurkó, L.G., Lyons, T., Kontkowski, M., Field, J.: Extracting information from the signature of a financial data stream. Preprint (2013). arXiv:1307.7244
Diggle, P., Heagerty, P., Liang, K.-Y., Zeger, S.: Analysis of longitudinal data. In: Analysis of Longitudinal Data, pp. 379–379 (2013)
Dainotti, A., Pescapè, A., Claffy, K.C.: Issues and future directions in traffic classification. IEEE Netw. 26(1), 35–40 (2012). https://doi.org/10.1109/MNET.2012.6135854
Bartos, K., Sofka, M., Franc, V.: Optimized invariant representation of network traffic for detecting unseen malware variants. In: Holz, T., Savage, S. (eds.) 25th USENIX Security Symposium, pp. 807–822 (2016)
Morrill, J., Fermanian, A., Kidger, P., Lyons, T.: A generalised signature method for multivariate time series feature extraction. Preprint (2020). arXiv:2006.00873
Shbair, W., Cholez, T., Francois, J., Chrisment, I.: HTTPS websites dataset. http://betternet.lhs.loria.fr/datasets/https/
Stratosphere: Stratosphere Laboratory Datasets. Retrieved March 13, 2020, from https://www.stratosphereips.org/datasets-overview (2015)
Draper-Gil, G., Lashkari, A..H., Mamun, M..S..I., Ghorbani, A..A.: Characterization of encrypted and VPN traffic using time-related features. In: Camp, O., Furnell, S., Mori, P. (eds.) Proceedings of the 2nd International Conference on Information Systems Security and Privacy ICISSP, pp. 407–414 (2016). https://doi.org/10.5220/0005740704070414
Lashkari, A..H., Draper-Gil, G., Mamun, M..S..I., Ghorbani, A..A.: Characterization of tor traffic using time based features. In. Mori, P., Furnell, S., Camp, O. (eds.) ICISSP, pp. 253–262 (2017). https://doi.org/10.5220/0006105602530262
Wang, W., Zhu, M., Wang, J., Zeng, X., Yang, Z.: End-to-end encrypted traffic classification with one-dimensional convolution neural networks. In: 2017 IEEE International Conference on Intelligence and Security Informatics ISI, pp. 43–48 (2017). https://doi.org/10.1109/ISI.2017.8004872
Acknowledgements
The authors would like to thank Wazen Shbair et al. for the public datasets and Patrick Kidger and Terry Lyons for publicly sharing the code of path signature.
Funding
This work was partially supported by NSFC (Grant No. 92067108), Natural Science Foundation of Guangdong Province (Grant No. 2021A1515011314).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by CJ and SX. The first draft of the manuscript was written by CJ and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Ethics approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jiang, C., Xu, S., Geng, G. et al. Seq2Path: a sequence-to-path-based flow feature fusion approach for encrypted traffic classification. Cluster Comput 26, 1785–1800 (2023). https://doi.org/10.1007/s10586-022-03709-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-022-03709-w