Skip to main content
Log in

Web attack detection based on traps

  • Original Submission
  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Every website on the Internet is somewhat vulnerable to security attacks. These attacks are constantly changing, and it is challenging to detect the latest, not known attacks. Our goal is automation of attack detection by incremental learning of the latest types of attacks. We have placed web traps around the Internet in a way that regular users cannot find and interact with them, while they are visible to standard hacker tools and methods. Consequently, we obtain continuous information about new types of attacks, contrary to most datasets from the literature created in artificial settings. In this paper, for the purpose of effective web attack detection without many false positives, we propose an efficient way to create a dataset by combining malicious requests from the traps and benign requests from a regular website. Since our goal is automation, we tested a significant number of shallow and deep machine learning models to separate regular from malicious HTTP requests, using only simple features, such as n-grams of characters. Additionally to our dataset, we have evaluated all the models on the large publicly available FWAF dataset. We also conducted model testing on zero-day attacks, in which training and validation requests were collected in separate time intervals. One of the biggest problems in machine learning is catastrophic forgetting. When training on new data, the model forgets the knowledge learned from previous examples. To mitigate that problem, we have implemented three incremental learning approaches for web attack detection and obtained good results during testing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Code Availability

We will make our code available immediately upon acceptance of the paper. Matplotlib and Seaborn were used to create the artwork.

Notes

  1. https://www.av-test.org/fileadmin/pdf/security_report/AV-TEST_Security_Report_2019-2020.pdf

  2. http://www.secrepo.com/

  3. https://github.com/faizann24/Fwaf-Machine-Learning-driven-Web-Application-Firewallhttps://github.com/faizann24/Fwaf-Machine-Learning-driven-Web-Application-Firewall

  4. https://drive.google.com/drive/folders/1xduETJ2LPgrSMwSEhMNe1BpyInYV2ZUy

  5. We had a discussion if standard HTTP commands, defined by the HTTP protocol (GET, POST, PUT, OPTIONS, etc.), should be used for tokenization or not. After a thorough analyses of malicious requests, we have discovered that attackers are using custom made HTTP commands, not defined by the HTTP protocol. Therefore, we decided to include HTTP commands in requests of our TBWIDD dataset, as it will allow detection of remote commands executed by attackers and their control of web instances.

  6. abbreviation for the minimum number of occurrences

  7. https://colab.research.google.com/

References

  1. Jung H, et al. (2018) Less-forgetful learning for domain expansion in deep neural networks Thirty-Second AAAI Conference on Artificial Intelligence

  2. Brown S, Lam R, Prasad S, Ramasubramanian S, Slauson J (2012) Honeypots in the cloud. University of Wisconsin-Madison, p 11

  3. Saadi C, Chaoui H (2016) Cloud computing security using ids-am-clust, honeyd, honeywall and honeycomb, vol 85

  4. Kondra JR, Bharti SK, Mishra SK, Babu KS (2016) Honeypot-based intrusion detection system: a performance analysis. In: 2016 3rd international conference on computing for sustainable global development (INDIACom), pp. 2347–2351. IEEE

  5. Ghourabi A, Abbes T, Bouhoula A (2014) Characterization of attacks collected from the deployment of Web service honeypot. Secur. Commun. Netw. 7(2):338–351

    Article  Google Scholar 

  6. Matin IMM, Rahardjo B (2019) Malware detection using honeypot and machine learning. In: 2019 7th international conference on cyber and IT service management (CITSM), vol. 7, pp. 1–4. IEEE

  7. Han X, Kheir N, Balzarotti D (2018) Deception techniques in computer security: a research perspective. ACM Computing Surveys (CSUR) 51(4):1–36

    Article  Google Scholar 

  8. Lippmann R, Cunningham RK, Fried DJ, Graf I, Kendall KR, Webster SE, Zissman MA (1999) Results of the DARPA 1998 offline intrusion detection evaluation. In recent advances in intrusion detection, 99, pp 829–835

  9. KDD Cup (1999) Intrusion detection dataset, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

  10. Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the kdd cup 99 data set. In: IEEE symposium on computational intelligence for security and defense applications, pp. 1–6. IEEE 2009

  11. Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA (2012) Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Computers & Security, vol. 31

  12. Kolias C, Kambourakis G, Stavrou A, Gritzalis S (2015) Intrusion detection in 802.11 networks: empirical evaluation of threats and a public dataset. IEEE Commun. Surv. Tutor. 18:184–208. https://doi.org/10.1109/COMST.2015.2402161

    Article  Google Scholar 

  13. Moustafa N, Slay J (2015) Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set), in 2015 Military Communications and Information Systems Conference (milCIS), pp. 1–6 IEEE

  14. Sharafaldin I, Habibi Lashkari A, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization, in proc of ICISSP

  15. Wang W, Sheng Y, Wang J, Zeng X, Ye X, Huang Y, Zhu M (2017) HAST-IDS: Learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection. IEEE Access 6:1792–1806

    Article  Google Scholar 

  16. S. Schmidhuber J, Hochreiter S (1997) Long short-term memory, vol 9

  17. Wu P, Guo H (2019) LuNET: a deep neural network for network intrusion detection. In: 2019 IEEE symposium series on computational intelligence (SSCI), pp. 617–624. IEEE

  18. Wu P, Guo H, Moustafa N (2020) Pelican: A deep residual network for network intrusion detection. In: 2020 50th annual IEEE/IFIP international conference on dependable systems and networks workshops (DSN-W), pp. 55–62. IEEE

  19. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  20. Kasongo SM, Sun Y (2020) A deep long short-term memory based classifier for wireless intrusion detection system. ICT Express 6(2):98–103

    Article  Google Scholar 

  21. Kasongo SM, Sun Y (2021) A deep gated recurrent unit based model for wireless intrusion detection system. ICT Express 7(1):81–87

    Article  Google Scholar 

  22. Andalib A, Vakili VT (2020) An autonomous intrusion detection system using an ensemble of advanced learners. In: 2020 28th iranian conference on electrical engineering (ICEE), pp. 1–5. IEEE

  23. Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 workshop on deep learning, 2014

  24. Agarap AFM (2018) A neural network architecture combining gated recurrent unit (GRU) and support vector machine (SVM) for intrusion detection in network traffic data. In: Proceedings of the 2018 10th International Conference on Machine Learning and Computing

  25. Kanimozhi V, Prem Jacob T (2019) Artificial intelligence based network intrusion detection with hyper-parameter optimization tuning on the realistic cyber dataset CSE-CIC-IDS2018 using cloud computing. ICT Express 5(3):211–214

    Article  Google Scholar 

  26. Kanimozhi V, Prem Jacob T (2020) Artificial Intelligence outflanks all other machine learning classifiers in Network Intrusion Detection System on the realistic cyber dataset CSE-CIC-IDS2018 using cloud computing. ICT Express

  27. Rawat R, Shrivastav SK (2012) SQL injection attack Detection using SVM. Int. J. Comput. Appl. 42(13):1–4

    Google Scholar 

  28. Mohammadi B, Sabokrou M (2019) End-to-end adversarial learning for intrusion detection in computer networks IEEE 44th Conference on Local Computer Networks (LCN). IEEE 2019

  29. Zhang Y, Zhang Y, Zhang N, Xiao M (2020) A network intrusion detection method based on deep learning with higher accuracy. Procedia Comput. Sci. 174:50–54

    Article  Google Scholar 

  30. Almseidin M, et al. (2017) Evaluation of machine learning algorithms for intrusion detection system. 2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY) IEEE

  31. Farnaaz N, Jabbar MA (2016) Random forest modeling for network intrusion detection system. Procedia Comput. Sci. 89:213–217

    Article  Google Scholar 

  32. Rong W, Zhang B, Lv X (2019) Malicious web request detection using character-level CNN. International Conference on Machine Learning for Cyber Security. Springer, Cham

  33. Ito M, Iyatomi H (2018) Web application firewall using character-level convolutional neural network. IEEE 14th International Colloquium on Signal Processing & Its Applications (CSPA). IEEE 2018

  34. Zhang M, et al. (2017) A deep learning method to detect web attacks using a specially designed CNN. International Conference on Neural Information Processing. Springer Cham

  35. Liang J, Zhao W, Ye W (2017) Anomaly-based web attack detection: a deep learning approach Proceedings of the 2017 VI. International Conference on Network, Communication and Computing

  36. Burbeck K, Nadjm-Tehrani S (2007) Adaptive real-time anomaly detection with incremental clustering. Inf. Secur. Tech. Rep. 12(1):56–67

    Article  Google Scholar 

  37. Ifzarne S, Tabbaa H, Hafidi I, Lamghari N (2021) Anomaly Detection Using Machine Learning Techniques in Wireless Sensor Networks. J. Phys. Conf. Ser. 1743:012021

    Article  Google Scholar 

  38. Defazio A, Bach F, Lacoste-Julien S (2014) SAGA: A fast incremental gradient method with support for Non-Strongly convex composite objectives NIPS

  39. Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive-aggressive algorithms. J. Mach. Learn. Res. 7:551–585

    MathSciNet  MATH  Google Scholar 

  40. Zhou Y, Cheng G, Jiang S, Dai M (2020) Building an efficient intrusion detection system based on feature selection and ensemble classifier. Computer Networks, 107247

  41. Kim Y (2014) Convolutional Neural Networks for Sentence Classification. In: Proceedings of the Conference on empirical methods in natural language processing (EMNLP). Association for computational linguistics, 2014. https://doi.org/10.3115/v1/d14-1181

  42. Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Soft (TOMS) 11(1):37–57

    Article  MathSciNet  Google Scholar 

  43. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M et al (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  44. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: An Imperative Style, High-Performance Deep Learning Library neurIPS

  45. Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, conference track proceedings

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikola Stevanović.

Ethics declarations

Conflict of Interests

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Availability of Data and Material

We have made our TBWIDD dataset available, and put the link to it in the manuscript.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stevanović, N., Todorović, B. & Todorović, V. Web attack detection based on traps. Appl Intell 52, 12397–12421 (2022). https://doi.org/10.1007/s10489-021-03077-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-03077-9

Keywords

Navigation