Skip to main content
Log in

THS-IDPC: A three-stage hierarchical sampling method based on improved density peaks clustering algorithm for encrypted malicious traffic detection

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

With the rapid increase in amount of network encrypted traffic and malware samples using encryption to evade identification, detecting encrypted malicious traffic presents challenges. The quality of the encrypted traffic sampling method directly affects the result of malware detection, but most existing machine learning methods for sampling flow-based encrypted traffic data are inherently inaccurate. To solve these problems, an innovative three-stage hierarchical sampling approach based on the improved density peaks clustering algorithm (THS-IDPC) is proposed to enhance the accuracy and efficiency of encrypted malicious traffic detection model. First, we propose an improved density peaks clustering algorithm based on grid screening, custom center decision value and mutual neighbor degree (DPC-GS-MND). In DPC-GS-MND, grid screening effectively reduces the computational complexity and mutual neighbor degree improves the clustering accuracy. Then, we extract and research the three categories features of encrypted traffic data related to malicious activities, and adopt a three-layer hierarchical clustering algorithm based on DPC-GS-MND. Finally, a three-stage sampling approach based on the three-layer hierarchical clustering algorithm (THS-IDPC) is proposed to sample the encrypted traffic data for further deep detection. The experimental results demonstrated that the proposed THS-IDPC is very effective to reduce normal traffic from massive network encrypted traffic simultaneously, and the encrypted malicious traffic detection model with THS-IDPC sampling method can detect multiple encrypted malicious traffic families with higher accuracy and efficiency. Meanwhile, DPC-GS-MND and THS-IDPC have good application prospects in network intrusion detection system under the big data environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Liangchen C, Shu G, Baoxu L et al (2020) FEW-NNN: a fuzzy entropy weighted natural nearest neighbor method for flow-based network traffic attack detection. China Commun 17(5):151–167

    Article  Google Scholar 

  2. Anish SS, Fabio DT, Mark S (2019) Feature analysis of encrypted malicious traffic. Expert Syst Appl 125(7):130–141

    Google Scholar 

  3. Liu JY, Tian ZY (2019) A distance-based method for building an encrypted malware traffic identification framework. IEEE ACCESS 7:100014–100028

    Article  Google Scholar 

  4. Kovanen T, David G, Hämäläinen T (2016) Survey: Intrusion Detection Systems in Encrypted Traffic. In: SMART. Springer, Berlin

  5. CTU University. The malware capture facility project dataset [EB/OL]. https://www.stratosphereips.org/datasets-malware/

  6. Garcia S, Grill M, Stiborek J (2014) An empirical comparison of botnet detection methods. Comput Secur 45:100–123

    Article  Google Scholar 

  7. Brad Duncan. malware-traffic-analysis. https://www.malware-traffic-analysis.net/

  8. Callegati F, Cerroni W, Ramilli M (2009) Man-in-the-middle attack to the HTTPS protocol. IEEE Secur Priv 7(1):78–81

    Article  Google Scholar 

  9. Anderson B, Paul S, McGrew D (2018) Deciphering malware’s use of TLS (without decryption). J Comput Virol Hacking Techn 14(3):195–211

    Article  Google Scholar 

  10. Anderson B, McGrew D (2016) Identifying encrypted malware traffic with contextual flow data. In: Proceedings of ACM Workshop on Artificial Intelligence and Security Conference, pp 35–46

  11. Amoli PV, Hämäläinen T (2013) A real time unsupervised NIDS for detecting unknown and encrypted network attacks in high speed network. In: Proceedings of IEEE International Workshop on Measurements and Networking Conference, pp 149–154

  12. Modi J (2019) Detecting ransomware in encrypted network traffic using machine learning. A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Applied Science in the Department of Electrical and Computer Engineering

  13. Prasse P, Machlica L, Pevn T (2017) Malware detection by analysing encrypted network traffic with neural networks. In: Proceedings of IEEE ECML PKDD Conference, pp. 73–88

  14. Shah J (2018) Detection of malicious Encrypted Web Traffic using Machine Learning. A Project Report Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Engineering

  15. Lokoc J, Kohout J, Cech P (2016) k-NN classification of malware in HTTPS traffic using the metric space approach. In: Proceedings of Springer Intelligence and Security Informatics Conference, pp 131–145

  16. Anderson B, McGrew DA (2017) Machine learning for encrypted malware traffic classification: accounting for noisy labels and non-stationarity. In: Proceedings of ACM SIGKDD Conference, pp 1723–1732

  17. Su LY, Yao YP, Li N et al (2018) Hierarchical clustering based network traffic data reduction for improving suspicious flow detection. In: Proceedings of IEEE TrustCom/BigDataSE Conference, pp 744–753

  18. Yang Y, Zheng K, Wu C et al (2019) Building an effective intrusion detection system using the modified density peaks clustering algorithm and deep belief networks. Appl Sci 9(2):1–25

    Google Scholar 

  19. Li L, Zhang H, Peng H (2018) Nearest neighbors based density peaks approach to intrusion detection. Chaos Solitons Fractals 110:33–40

    Article  MathSciNet  Google Scholar 

  20. Shi Y, Shen H (2019) Anomaly detection for network flow using immune network and density peak. Int J Netw Secur 22:337–346

    Google Scholar 

  21. Syarif I, Prugel-Bennett A, Wills G (2012) Unsupervised clustering approach for network anomaly detection. In: Proceedings in International Conference on Networked Digital Technologies, pp 135–145

  22. Claffy K, Polyzos G, Braun H (1993) Application of sampling methodologies to network traffic characterization. ACM SIGCOMM Comput Commun Rev ACM 23(4):194–203

    Article  Google Scholar 

  23. Saber A, Fergani B, Abbas M (2018) Encrypted traffic classification: combining over-and under-sampling through a PCA-SVM. In: Proceedings of IEEE PAIS Conference, pp 37–41

  24. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496

    Article  Google Scholar 

  25. Xu X, Ding SF, Shi ZZ (2018) An improved density peaks clustering algorithm with fast finding cluster centers. Knowl-Based Syst 158:65–74

    Article  Google Scholar 

  26. Seyedi SA, Lotfi A, Moradi P (2019) Dynamic graph-based label propagation for density peaks clustering. Expert Syst Appl 1(115):314–328

    Article  Google Scholar 

  27. Androulidakis G, Papavassiliou S (2008) Improving network anomaly detection via selective flow-based sampling. IET Commun 2(3):399–409

    Article  Google Scholar 

  28. Duffield N, Lund C (2003) Predicting resource usage and estimation accuracy in an IP flow measurement collection infrastructure. In: Proceedings of 3rd ACM SIGCOMM Conference on Internet Measurement. ACM, pp 179–191

Download references

Acknowledgements

This research has been supported by the Natural Science Foundation of China (Nos. 61802404, 61602470), Strategic Priority Research Program of Chinese Academy of Sciences (No. XDC02040100), Fundamental Research Funds for the Central Universities of China University of Labor Relations (Nos. 20ZYJS017, 20XYJS003), Key Research Program of Beijing Municipal Science and Technology Commission (No. D181100000618003). This research was also partially supported by Key Laboratory of Network Assessment Technology, Chinese Academy of Sciences and Beijing Key Laboratory of Network Security and Protection Technology. We would like to express my sincere gratitude to the anonymous reviewers for their constructive feedback, which helped improve the quality of this paper. And we would also like to thank Yao Yepeng, Su Liya and Jiang Bo for reviewing my work and providing their helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Liangchen Chen or Baoxu Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, L., Gao, S., Liu, B. et al. THS-IDPC: A three-stage hierarchical sampling method based on improved density peaks clustering algorithm for encrypted malicious traffic detection. J Supercomput 76, 7489–7518 (2020). https://doi.org/10.1007/s11227-020-03372-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03372-1

Keywords