Skip to main content
Log in

Clustering unknown network traffic with dual-path autoencoder

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Currently, the proportion of unknown traffic in networks continues to increase. This poses great challenges to the management and security of cyberspace. The unknown traffic refers to network traffic generated by previously unknown protocols in a preconstructed traffic identification system. Measures to address this challenge can be developed by grouping the mixed unknown traffic into multiple clusters, where, ideally, each cluster contains just one traffic class. In this paper, we propose a novel scheme for clustering unknown traffic, named dual-path autoencoder-based clustering, to discover protocol-based traffic classes. The dual-path autoencoder model refers to the combination of convolutional autoencoder and deep autoencoder, which realizes the extraction and aggregation of payload features and statistical features. Then, the fusion feature is clustered by the correlation-adjusted clustering module, and the unknown traffic flows are divided into multiple high-purity clusters. To evaluate our scheme, experiments are conducted on two public network traffic datasets and one campus network dataset. Using seven common application layer protocols to simulate unknown traffic, the evaluation results show that our scheme can achieve above 98% on each dataset when the preset number of clusters is 60. This establishes the effectiveness of the proposed scheme for clustering unknown network protocols.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

The ISCX2012 and ISCX_Botnet datasets that support the findings of this study are available in “https://www.unb.ca/cic/datasets/”. The selfDataset that support the findings of this study is available on request from the corresponding author. The selfDataset is not publicly available due to it containing information that could compromise research participant privacy.

Notes

  1. https://www.ntop.org/products/packet-capture/pf_ring/.

References

  1. Biersack E, Callegari C, Matijasevic M (2013) Data traffic monitoring and analysis: from measurement, classification, and anomaly detection to quality of experience. Lect Notes Comput Sci 5(23):12561–12570

    Google Scholar 

  2. Rezaei S, Liu X (2019) Deep learning for encrypted traffic classification: an overview. IEEE Commun Mag 57(5):76–81

    Article  Google Scholar 

  3. Zhang J, Xiao C, Yang X, Zhou W, Jie W (2015) Robust network traffic classification. IEEE/ACM Trans Netw 23(4):1257–1270

    Article  Google Scholar 

  4. Zhang Y, Zhao S, Sang Y (2019) Towards unknown traffic identification using deep auto-encoder and constrained clustering. In: International conference on computational science

  5. Chen Y, Li Z, Shi J, Gou G, Xiong G (2020) Not afraid of the unseen: a siamese network based scheme for unknown traffic discovery. In: IEEE symposium on computers and communications (ISCC)

  6. Yang Z, Lin W (2020) Unknown traffic identification based on deep adaptation networks. In: IEEE 45th LCN symposium on emerging topics in networking (LCN symposium), pp 10–18

  7. Qin M, Lei K, Bai B, Zhang G (2019) Towards a profiling view for unsupervised traffic classification by exploring the statistic features and link patterns. In: SIGCOMM 2019 NetAI workshop

  8. Palmieri F, Fiore U (2009) A nonlinear, recurrence-based approach to traffic classification. Comput Netw 53(6):761–773

    Article  MATH  Google Scholar 

  9. Tongaonkar A, Keralapura R, Nucci A (2013) Santaclass: a self adaptive network traffic classification system. IFIP Netw Conf 2013:1–9

    Google Scholar 

  10. Yun X, Wang Y, Zhang Y, Zhou Y (2016) A semantics-aware approach to the automated network protocol identification. IEEE/ACM Trans Netw 24(1):583–595

    Article  Google Scholar 

  11. Wang Y, Yun X, Zhang Y (2015) Rethinking robust and accurate application protocol identification: a nonparametric approach. In: IEEE 23rd International conference on network protocols (ICNP)

  12. Zhao S, Zhang Y, Sang Y (2019) Towards unknown traffic identification via embeddings and deep autoencoders. In: 26th International conference on telecommunications (ICT)

  13. Sun F, Wang S, Zhang C, Zhang H (2020) Clustering of unknown protocol messages based on format comparison. Comput Netw 179:107296

    Article  Google Scholar 

  14. Zhang J, Yang X, Zhou W, Yu W (2013) Unsupervised traffic classification using flow statistical properties and IP packet payload. J Comput Syst Sci 79(5):573–585

    Article  MathSciNet  Google Scholar 

  15. Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C (2007) Offline/realtime traffic classification using semi-supervised learning. Perform Eval 64(9):1194–1213

    Article  Google Scholar 

  16. Aouini Z, Pekar A (2022) NFStream: a flexible network data analysis framework. Comput Netw 204:108719. https://doi.org/10.1016/j.comnet.2021.108719

    Article  Google Scholar 

  17. Deri L, Martinelli M, Bujlow T, Cardigliano A (2014) ndpi: Open-source high-speed deep packet inspection. In: International wireless communications and mobile computing conference (IWCMC), pp 617–622. https://doi.org/10.1109/IWCMC.2014.6906427

  18. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  19. Liu Y, Zhang S, Ding B, Li X, Wang Y (2018) A cascade forest approach to application classification of mobile traces, pp 1–6

  20. Liou CY, Cheng WC, Liou JW, Liou DR (2014) Autoencoder for words. Neurocomputing 139:84–96

    Article  MATH  Google Scholar 

  21. Kingma D.P, Ba J (2014) Adam: a method for stochastic optimization, pp 273–297. arXiv preprint. arXiv:1412.6980

  22. Chiu K-C, Liu C-C, Chou L-D (2020) CAPC: packet-based network service classifier with convolutional autoencoder. IEEE Access 8:218081–218094

    Article  Google Scholar 

  23. Erman J, Arlitt MF, Mahanti A (2006) Traffic classification using clustering algorithms. In: Proceedings of the 2nd annual ACM workshop on mining network data, MineNet 2006, Pisa, Italy

  24. Usama M, Qadir J, Raza A et al (2017) Unsupervised machine learning for networking: techniques, applications and research challenges. IEEE Access 7:65579–65615

    Article  Google Scholar 

  25. Baldi M, Baldini A, Cascarano N, Risso F (2009) Service-based traffic classification: principles and validation. In: IEEE Sarnoff symposium

  26. Cascarano N, Risso F, Torino PD, Este A, Gringoli F, Salgarelli L, Finamore R, Mellia M (2010) Comparing P2PTV traffic classifiers. In: IEEE Xplore

  27. Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA (2012) Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput Secur 31(3):357–374. https://doi.org/10.1016/j.cose.2011.12.012

    Article  Google Scholar 

  28. Beigi EB, Jazi HH, Stakhanova N, Ghorbani AA (2014) Towards effective feature selection in machine learning-based botnet detection approaches. In: IEEE conference on communications and network security, pp 247–255. https://doi.org/10.1109/CNS.2014.6997492

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fengyu Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fu, Y., Li, X., Li, X. et al. Clustering unknown network traffic with dual-path autoencoder. Neural Comput & Applic 35, 8955–8966 (2023). https://doi.org/10.1007/s00521-022-08138-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-08138-9

Keywords

Navigation