HSLF: HTTP Header Sequence Based LSH Fingerprints for Application Traffic Classification

Tang, Zixian; Wang, Qiang; Li, Wenhao; Bao, Huaifeng; Liu, Feng; Wang, Wen

doi:10.1007/978-3-030-77961-0_5

Zixian Tang^13,14,
Qiang Wang^13,14,
Wenhao Li^13,14,
Huaifeng Bao^13,14,
Feng Liu^13,14 &
…
Wen Wang^13,14

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12742))

Included in the following conference series:

International Conference on Computational Science

1911 Accesses

Abstract

Distinguishing the prosperous network application is a challenging task in network management that has been extensively studied for many years. Unfortunately, previous work on HTTP traffic classification rely heavily on prior knowledge with coarse grained thus are limited in detecting the evolution of new emerging application and network behaviors. In this paper, we propose HSLF, a hierarchical system that employs application fingerprint to classify HTTP traffic. Specifically, we employ local-sensitive hashing algorithm to obtain the importance of each field in HTTP header, from which a rational weight allocation scheme and fingerprint of each HTTP session are generated. Then, similarities of fingerprints among each application are calculated to classify the unknown HTTP traffic. Performance on a real-world dataset of HSLF achieves an accuracy of 96.6%, which outperforms classic machine learning methods and state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Detection of Malicious HTTP Requests Using Header and URL Features

Cyber-attack detection via non-linear prediction of IP addresses: an innovative big data analytics approach

Article Open access 04 September 2021

The HTTP Content Segmentation Method Combined with AdaBoost Classifier for Web-Layer Anomaly Detection System

References

Buchanan, W.J., Helme, S., Woodward, A.: Analysis of the adoption of security headers in http. IET Inf. Secur. 12(2), 118–126 (2017)
Article Google Scholar
Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, pp. 380–388 (2002)
Google Scholar
Crotti, M., Dusi, M., Gringoli, F., Salgarelli, L.: Traffic classification through simple statistical fingerprinting. ACM SIGCOMM Comput. Commun. Rev. 37(1), 5–16 (2007)
Article Google Scholar
van Ede, T., et al.: Flowprint: semi-supervised mobile-app fingerprinting on encrypted network traffic. In: Network and Distributed System Security Symposium, NDSS 2020. Internet Society (2020)
Google Scholar
Fraleigh, C., et al.: Packet-level traffic measurements from the sprint IP backbone. IEEE Network 17(6), 6–16 (2003)
Article Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613 (1998)
Google Scholar
Jie, Y., Lun, Y., Yang, H., Chen, L.y.: Timely traffic identification on p2p streaming media. J. China Universities Posts Telecommun. 19(2), 67–73 (2012)
Google Scholar
Kaoprakhon, S., Visoottiviseth, V.: Classification of audio and video traffic over http protocol. In: 2009 9th International Symposium on Communications and Information Technology pp. 1534–1539. IEEE (2009)
Google Scholar
Lavrenovs, A., Melón, F.J.R.: Http security headers analysis of top one million websites. In: 2018 10th International Conference on Cyber Conflict (CyCon), pp. 345–370. IEEE (2018)
Google Scholar
Li, Y., Li, J.: Multiclassifier: A combination of dpi and ml for application-layer classification in SDN. In: The 2014 2nd International Conference on Systems and Informatics (ICSAI 2014), pp. 682–686. IEEE (2014)
Google Scholar
Li, Z., Yuan, R., Guan, X.: Accurate classification of the internet traffic based on the SVM method. In: 2007 IEEE International Conference on Communications, pp. 1373–1378. IEEE (2007)
Google Scholar
Liu, C., He, L., Xiong, G., Cao, Z., Li, Z.: Fs-net: a flow sequence network for encrypted traffic classification. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, pp. 1171–1179. IEEE (2019)
Google Scholar
Liu, C.C., Chang, Y., Tseng, C.W., Yang, Y.T., Lai, M.S., Chou, L.D.: SVM-based classification mechanism and its application in SDN networks. In: 2018 10th International Conference on Communication Software and Networks (ICCSN), pp. 45–49. IEEE (2018)
Google Scholar
Manku, G.S., Jain, A., Das Sarma, A.: Detecting near-duplicates for web crawling. In: Proceedings of the 16th International Conference on World Wide Web, pp. 141–150 (2007)
Google Scholar
Moore, A., Zuev, D., Crogan, M.: Discriminators for use in flow-based classification. Technical report (2013)
Google Scholar
Pham, K., Santos, A., Freire, J.: Understanding website behavior based on user agent. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 1053–1056 (2016)
Google Scholar
Raghuramu, A., Pathak, P.H., Zang, H., Han, J., Liu, C., Chuah, C.N.: Uncovering the footprints of malicious traffic in wireless/mobile networks. Comput. Commun. 95, 95–107 (2016)
Article Google Scholar
Wang, S., et al.: Trafficav: an effective and explainable detection of mobile malware behavior using network traffic. In: 2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS), pp. 1–6. IEEE (2016)
Google Scholar
Williams, N., Zander, S.: Evaluating machine learning algorithms for automated network application identification (2006)
Google Scholar
Xu, F., et al.: Identifying malware with http content type inconsistency via header-payload comparison. In: 2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC), pp. 1–7. IEEE (2017)
Google Scholar
Xu, Q., Erman, J., Gerber, A., Mao, Z., Pang, J., Venkataraman, S.: Identifying diverse usage behaviors of smartphone apps. In: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, pp. 329–344 (2011)
Google Scholar
Yao, H., Ranjan, G., Tongaonkar, A., Liao, Y., Mao, Z.M.: Samples: self adaptive mining of persistent lexical snippets for classifying mobile application traffic. In: Proceedings of the 21st Annual International Conference on Mobile Computing and Networking, pp. 439–451 (2015)
Google Scholar
Zhang, J., Xiang, Y., Zhou, W., Wang, Y.: Unsupervised traffic classification using flow statistical properties and IP packet payload. J. Comput. Syst. Sci. 79(5), 573–585 (2013)
Article MathSciNet Google Scholar

Download references

Acknowledgment

This work was supported by the National Key R&D Program of China with No. 2018YFC0806900 and No. 2018YFB0805004, Beijing Municipal Science & Technology Commission with Project No. Z191100007119009, NSFC No.61902397, NSFC No. U2003111 and NSFC No. 61871378.

Author information

Authors and Affiliations

State Key Laboratory of Information Security, Institute of Information Engineering, CAS, Beijing, China
Zixian Tang, Qiang Wang, Wenhao Li, Huaifeng Bao, Feng Liu & Wen Wang
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Zixian Tang, Qiang Wang, Wenhao Li, Huaifeng Bao, Feng Liu & Wen Wang

Authors

Zixian Tang
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenhao Li
View author publications
You can also search for this author in PubMed Google Scholar
Huaifeng Bao
View author publications
You can also search for this author in PubMed Google Scholar
Feng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wen Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wen Wang .

Editor information

Editors and Affiliations

AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
Ludwig-Maximilians-Universität München, Munich, Germany
Dieter Kranzlmüller
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M. A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, Z., Wang, Q., Li, W., Bao, H., Liu, F., Wang, W. (2021). HSLF: HTTP Header Sequence Based LSH Fingerprints for Application Traffic Classification. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12742. Springer, Cham. https://doi.org/10.1007/978-3-030-77961-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-77961-0_5
Published: 09 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77960-3
Online ISBN: 978-3-030-77961-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics