THS-IDPC: A three-stage hierarchical sampling method based on improved density peaks clustering algorithm for encrypted malicious traffic detection

Chen, Liangchen; Gao, Shu; Liu, Baoxu; Lu, Zhigang; Jiang, Zhengwei

doi:10.1007/s11227-020-03372-1

THS-IDPC: A three-stage hierarchical sampling method based on improved density peaks clustering algorithm for encrypted malicious traffic detection

Published: 29 June 2020

Volume 76, pages 7489–7518, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Liangchen Chen ORCID: orcid.org/0000-0002-3417-5840^1,2,3,
Shu Gao¹,
Baoxu Liu^2,4,
Zhigang Lu^2,4 &
…
Zhengwei Jiang^2,4

1293 Accesses
Explore all metrics

Abstract

With the rapid increase in amount of network encrypted traffic and malware samples using encryption to evade identification, detecting encrypted malicious traffic presents challenges. The quality of the encrypted traffic sampling method directly affects the result of malware detection, but most existing machine learning methods for sampling flow-based encrypted traffic data are inherently inaccurate. To solve these problems, an innovative three-stage hierarchical sampling approach based on the improved density peaks clustering algorithm (THS-IDPC) is proposed to enhance the accuracy and efficiency of encrypted malicious traffic detection model. First, we propose an improved density peaks clustering algorithm based on grid screening, custom center decision value and mutual neighbor degree (DPC-GS-MND). In DPC-GS-MND, grid screening effectively reduces the computational complexity and mutual neighbor degree improves the clustering accuracy. Then, we extract and research the three categories features of encrypted traffic data related to malicious activities, and adopt a three-layer hierarchical clustering algorithm based on DPC-GS-MND. Finally, a three-stage sampling approach based on the three-layer hierarchical clustering algorithm (THS-IDPC) is proposed to sample the encrypted traffic data for further deep detection. The experimental results demonstrated that the proposed THS-IDPC is very effective to reduce normal traffic from massive network encrypted traffic simultaneously, and the encrypted malicious traffic detection model with THS-IDPC sampling method can detect multiple encrypted malicious traffic families with higher accuracy and efficiency. Meanwhile, DPC-GS-MND and THS-IDPC have good application prospects in network intrusion detection system under the big data environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ENS-RFMC: An Encrypted Network Traffic Sampling Method Based on Rule-Based Feature Extraction and Multi-hierarchical Clustering for Intrusion Detection

An improved density peaks clustering algorithm based on grid screening and mutual neighborhood degree for network anomaly detection

Article Open access 26 January 2022

A Novel 3D Intelligent Cluster Method for Malicious Traffic Fine-Grained Classification

References

Liangchen C, Shu G, Baoxu L et al (2020) FEW-NNN: a fuzzy entropy weighted natural nearest neighbor method for flow-based network traffic attack detection. China Commun 17(5):151–167
Article Google Scholar
Anish SS, Fabio DT, Mark S (2019) Feature analysis of encrypted malicious traffic. Expert Syst Appl 125(7):130–141
Google Scholar
Liu JY, Tian ZY (2019) A distance-based method for building an encrypted malware traffic identification framework. IEEE ACCESS 7:100014–100028
Article Google Scholar
Kovanen T, David G, Hämäläinen T (2016) Survey: Intrusion Detection Systems in Encrypted Traffic. In: SMART. Springer, Berlin
CTU University. The malware capture facility project dataset [EB/OL]. https://www.stratosphereips.org/datasets-malware/
Garcia S, Grill M, Stiborek J (2014) An empirical comparison of botnet detection methods. Comput Secur 45:100–123
Article Google Scholar
Brad Duncan. malware-traffic-analysis. https://www.malware-traffic-analysis.net/
Callegati F, Cerroni W, Ramilli M (2009) Man-in-the-middle attack to the HTTPS protocol. IEEE Secur Priv 7(1):78–81
Article Google Scholar
Anderson B, Paul S, McGrew D (2018) Deciphering malware’s use of TLS (without decryption). J Comput Virol Hacking Techn 14(3):195–211
Article Google Scholar
Anderson B, McGrew D (2016) Identifying encrypted malware traffic with contextual flow data. In: Proceedings of ACM Workshop on Artificial Intelligence and Security Conference, pp 35–46
Amoli PV, Hämäläinen T (2013) A real time unsupervised NIDS for detecting unknown and encrypted network attacks in high speed network. In: Proceedings of IEEE International Workshop on Measurements and Networking Conference, pp 149–154
Modi J (2019) Detecting ransomware in encrypted network traffic using machine learning. A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Applied Science in the Department of Electrical and Computer Engineering
Prasse P, Machlica L, Pevn T (2017) Malware detection by analysing encrypted network traffic with neural networks. In: Proceedings of IEEE ECML PKDD Conference, pp. 73–88
Shah J (2018) Detection of malicious Encrypted Web Traffic using Machine Learning. A Project Report Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Engineering
Lokoc J, Kohout J, Cech P (2016) k-NN classification of malware in HTTPS traffic using the metric space approach. In: Proceedings of Springer Intelligence and Security Informatics Conference, pp 131–145
Anderson B, McGrew DA (2017) Machine learning for encrypted malware traffic classification: accounting for noisy labels and non-stationarity. In: Proceedings of ACM SIGKDD Conference, pp 1723–1732
Su LY, Yao YP, Li N et al (2018) Hierarchical clustering based network traffic data reduction for improving suspicious flow detection. In: Proceedings of IEEE TrustCom/BigDataSE Conference, pp 744–753
Yang Y, Zheng K, Wu C et al (2019) Building an effective intrusion detection system using the modified density peaks clustering algorithm and deep belief networks. Appl Sci 9(2):1–25
Google Scholar
Li L, Zhang H, Peng H (2018) Nearest neighbors based density peaks approach to intrusion detection. Chaos Solitons Fractals 110:33–40
Article MathSciNet Google Scholar
Shi Y, Shen H (2019) Anomaly detection for network flow using immune network and density peak. Int J Netw Secur 22:337–346
Google Scholar
Syarif I, Prugel-Bennett A, Wills G (2012) Unsupervised clustering approach for network anomaly detection. In: Proceedings in International Conference on Networked Digital Technologies, pp 135–145
Claffy K, Polyzos G, Braun H (1993) Application of sampling methodologies to network traffic characterization. ACM SIGCOMM Comput Commun Rev ACM 23(4):194–203
Article Google Scholar
Saber A, Fergani B, Abbas M (2018) Encrypted traffic classification: combining over-and under-sampling through a PCA-SVM. In: Proceedings of IEEE PAIS Conference, pp 37–41
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Article Google Scholar
Xu X, Ding SF, Shi ZZ (2018) An improved density peaks clustering algorithm with fast finding cluster centers. Knowl-Based Syst 158:65–74
Article Google Scholar
Seyedi SA, Lotfi A, Moradi P (2019) Dynamic graph-based label propagation for density peaks clustering. Expert Syst Appl 1(115):314–328
Article Google Scholar
Androulidakis G, Papavassiliou S (2008) Improving network anomaly detection via selective flow-based sampling. IET Commun 2(3):399–409
Article Google Scholar
Duffield N, Lund C (2003) Predicting resource usage and estimation accuracy in an IP flow measurement collection infrastructure. In: Proceedings of 3rd ACM SIGCOMM Conference on Internet Measurement. ACM, pp 179–191

Download references

Acknowledgements

This research has been supported by the Natural Science Foundation of China (Nos. 61802404, 61602470), Strategic Priority Research Program of Chinese Academy of Sciences (No. XDC02040100), Fundamental Research Funds for the Central Universities of China University of Labor Relations (Nos. 20ZYJS017, 20XYJS003), Key Research Program of Beijing Municipal Science and Technology Commission (No. D181100000618003). This research was also partially supported by Key Laboratory of Network Assessment Technology, Chinese Academy of Sciences and Beijing Key Laboratory of Network Security and Protection Technology. We would like to express my sincere gratitude to the anonymous reviewers for their constructive feedback, which helped improve the quality of this paper. And we would also like to thank Yao Yepeng, Su Liya and Jiang Bo for reviewing my work and providing their helpful comments.

Author information

Authors and Affiliations

School of Computer Science and Technology, Wuhan University of Technology, Wuhan, 430063, China
Liangchen Chen & Shu Gao
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100093, China
Liangchen Chen, Baoxu Liu, Zhigang Lu & Zhengwei Jiang
School of Applied Technology, China University of Labor Relations, Beijing, 100048, China
Liangchen Chen
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, 100049, China
Baoxu Liu, Zhigang Lu & Zhengwei Jiang

Authors

Liangchen Chen
View author publications
You can also search for this author inPubMed Google Scholar
Shu Gao
View author publications
You can also search for this author inPubMed Google Scholar
Baoxu Liu
View author publications
You can also search for this author inPubMed Google Scholar
Zhigang Lu
View author publications
You can also search for this author inPubMed Google Scholar
Zhengwei Jiang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Liangchen Chen or Baoxu Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, L., Gao, S., Liu, B. et al. THS-IDPC: A three-stage hierarchical sampling method based on improved density peaks clustering algorithm for encrypted malicious traffic detection. J Supercomput 76, 7489–7518 (2020). https://doi.org/10.1007/s11227-020-03372-1

Download citation

Published: 29 June 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11227-020-03372-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

THS-IDPC: A three-stage hierarchical sampling method based on improved density peaks clustering algorithm for encrypted malicious traffic detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

ENS-RFMC: An Encrypted Network Traffic Sampling Method Based on Rule-Based Feature Extraction and Multi-hierarchical Clustering for Intrusion Detection

An improved density peaks clustering algorithm based on grid screening and mutual neighborhood degree for network anomaly detection

A Novel 3D Intelligent Cluster Method for Malicious Traffic Fine-Grained Classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now