Abstract
With the rapid increase in amount of network encrypted traffic and malware samples using encryption to evade identification, detecting encrypted malicious traffic presents challenges. The quality of the encrypted traffic sampling method directly affects the result of malware detection, but most existing machine learning methods for sampling flow-based encrypted traffic data are inherently inaccurate. To solve these problems, an innovative three-stage hierarchical sampling approach based on the improved density peaks clustering algorithm (THS-IDPC) is proposed to enhance the accuracy and efficiency of encrypted malicious traffic detection model. First, we propose an improved density peaks clustering algorithm based on grid screening, custom center decision value and mutual neighbor degree (DPC-GS-MND). In DPC-GS-MND, grid screening effectively reduces the computational complexity and mutual neighbor degree improves the clustering accuracy. Then, we extract and research the three categories features of encrypted traffic data related to malicious activities, and adopt a three-layer hierarchical clustering algorithm based on DPC-GS-MND. Finally, a three-stage sampling approach based on the three-layer hierarchical clustering algorithm (THS-IDPC) is proposed to sample the encrypted traffic data for further deep detection. The experimental results demonstrated that the proposed THS-IDPC is very effective to reduce normal traffic from massive network encrypted traffic simultaneously, and the encrypted malicious traffic detection model with THS-IDPC sampling method can detect multiple encrypted malicious traffic families with higher accuracy and efficiency. Meanwhile, DPC-GS-MND and THS-IDPC have good application prospects in network intrusion detection system under the big data environment.
















Similar content being viewed by others
References
Liangchen C, Shu G, Baoxu L et al (2020) FEW-NNN: a fuzzy entropy weighted natural nearest neighbor method for flow-based network traffic attack detection. China Commun 17(5):151–167
Anish SS, Fabio DT, Mark S (2019) Feature analysis of encrypted malicious traffic. Expert Syst Appl 125(7):130–141
Liu JY, Tian ZY (2019) A distance-based method for building an encrypted malware traffic identification framework. IEEE ACCESS 7:100014–100028
Kovanen T, David G, Hämäläinen T (2016) Survey: Intrusion Detection Systems in Encrypted Traffic. In: SMART. Springer, Berlin
CTU University. The malware capture facility project dataset [EB/OL]. https://www.stratosphereips.org/datasets-malware/
Garcia S, Grill M, Stiborek J (2014) An empirical comparison of botnet detection methods. Comput Secur 45:100–123
Brad Duncan. malware-traffic-analysis. https://www.malware-traffic-analysis.net/
Callegati F, Cerroni W, Ramilli M (2009) Man-in-the-middle attack to the HTTPS protocol. IEEE Secur Priv 7(1):78–81
Anderson B, Paul S, McGrew D (2018) Deciphering malware’s use of TLS (without decryption). J Comput Virol Hacking Techn 14(3):195–211
Anderson B, McGrew D (2016) Identifying encrypted malware traffic with contextual flow data. In: Proceedings of ACM Workshop on Artificial Intelligence and Security Conference, pp 35–46
Amoli PV, Hämäläinen T (2013) A real time unsupervised NIDS for detecting unknown and encrypted network attacks in high speed network. In: Proceedings of IEEE International Workshop on Measurements and Networking Conference, pp 149–154
Modi J (2019) Detecting ransomware in encrypted network traffic using machine learning. A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Applied Science in the Department of Electrical and Computer Engineering
Prasse P, Machlica L, Pevn T (2017) Malware detection by analysing encrypted network traffic with neural networks. In: Proceedings of IEEE ECML PKDD Conference, pp. 73–88
Shah J (2018) Detection of malicious Encrypted Web Traffic using Machine Learning. A Project Report Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Engineering
Lokoc J, Kohout J, Cech P (2016) k-NN classification of malware in HTTPS traffic using the metric space approach. In: Proceedings of Springer Intelligence and Security Informatics Conference, pp 131–145
Anderson B, McGrew DA (2017) Machine learning for encrypted malware traffic classification: accounting for noisy labels and non-stationarity. In: Proceedings of ACM SIGKDD Conference, pp 1723–1732
Su LY, Yao YP, Li N et al (2018) Hierarchical clustering based network traffic data reduction for improving suspicious flow detection. In: Proceedings of IEEE TrustCom/BigDataSE Conference, pp 744–753
Yang Y, Zheng K, Wu C et al (2019) Building an effective intrusion detection system using the modified density peaks clustering algorithm and deep belief networks. Appl Sci 9(2):1–25
Li L, Zhang H, Peng H (2018) Nearest neighbors based density peaks approach to intrusion detection. Chaos Solitons Fractals 110:33–40
Shi Y, Shen H (2019) Anomaly detection for network flow using immune network and density peak. Int J Netw Secur 22:337–346
Syarif I, Prugel-Bennett A, Wills G (2012) Unsupervised clustering approach for network anomaly detection. In: Proceedings in International Conference on Networked Digital Technologies, pp 135–145
Claffy K, Polyzos G, Braun H (1993) Application of sampling methodologies to network traffic characterization. ACM SIGCOMM Comput Commun Rev ACM 23(4):194–203
Saber A, Fergani B, Abbas M (2018) Encrypted traffic classification: combining over-and under-sampling through a PCA-SVM. In: Proceedings of IEEE PAIS Conference, pp 37–41
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Xu X, Ding SF, Shi ZZ (2018) An improved density peaks clustering algorithm with fast finding cluster centers. Knowl-Based Syst 158:65–74
Seyedi SA, Lotfi A, Moradi P (2019) Dynamic graph-based label propagation for density peaks clustering. Expert Syst Appl 1(115):314–328
Androulidakis G, Papavassiliou S (2008) Improving network anomaly detection via selective flow-based sampling. IET Commun 2(3):399–409
Duffield N, Lund C (2003) Predicting resource usage and estimation accuracy in an IP flow measurement collection infrastructure. In: Proceedings of 3rd ACM SIGCOMM Conference on Internet Measurement. ACM, pp 179–191
Acknowledgements
This research has been supported by the Natural Science Foundation of China (Nos. 61802404, 61602470), Strategic Priority Research Program of Chinese Academy of Sciences (No. XDC02040100), Fundamental Research Funds for the Central Universities of China University of Labor Relations (Nos. 20ZYJS017, 20XYJS003), Key Research Program of Beijing Municipal Science and Technology Commission (No. D181100000618003). This research was also partially supported by Key Laboratory of Network Assessment Technology, Chinese Academy of Sciences and Beijing Key Laboratory of Network Security and Protection Technology. We would like to express my sincere gratitude to the anonymous reviewers for their constructive feedback, which helped improve the quality of this paper. And we would also like to thank Yao Yepeng, Su Liya and Jiang Bo for reviewing my work and providing their helpful comments.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, L., Gao, S., Liu, B. et al. THS-IDPC: A three-stage hierarchical sampling method based on improved density peaks clustering algorithm for encrypted malicious traffic detection. J Supercomput 76, 7489–7518 (2020). https://doi.org/10.1007/s11227-020-03372-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03372-1