Abstract
Web applications have become ubiquitous and offer a wide range of services, from content management and e-commerce to social networking. However, these applications are also prime targets for cyberattacks that exploit a variety of vulnerabilities. With the rise in use of Ubiquitous Web Applications (UWA) which can be accessed globally from various devices, it is imperative to automate the detection and classification of these attacks. In this study, we detect and classify web attacks using several classification machine learning models. We conduct a comparative analysis of the web attack classification results from Decision Trees, Random Forest, Support Vector Classifier (SVC) and K-Nearest Neighbor (KNN) machine learning models, using multiple text feature vectorization techniques such as the context-insensitive TF-IDF vectorizer, the bi-directional context-aware BERT transformer, and a combination of both techniques on the Webserver logs. We find that the Random Forest classifier performs best using BERT transformer for text features captured by the Webserver logs with 99% accuracy and \(F_{1}\) score for classifying web attacks. We also find that there is no significant gain in the accuracy of transformers over TF-IDF vectorizer for these text features presumably because of the preprocessing techniques we use on the command like syntax. Also, with TF-IDF text vectorization, both SVC and KNN classification models performed better than Random Forest classification model against Webserver logs to detect and classify Web application attacks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
BritneyMuller: Bert 101 - state of the art NLP model explained. https://huggingface.co/blog/bert-101#4-berts-performance-on-common-language-tasks
Center, V.S.R.C.I.: 2022 data breach investigations report. https://github.com/vz-risk/dbir/tree/gh-pages/2022
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Clincy, V., Shahriar, H.: Web service injection attack detection. In: 2017 12th International Conference for Internet Technology and Secured Transactions (ICITST), pp. 173–178 (2017). https://doi.org/10.23919/ICITST.2017.8356371
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2019)
Gupta, S., Gupta, B.B.: Cross-site scripting (XSS) attacks and defense mechanisms: classification and state-of-the-art. Int. J. Syst. Assur. Eng. Manage. 8(1), 512–530 (2017)
Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)
Liu, C.z., Sheng, Y.x., Wei, Z.q., Yang, Y.Q.: Research of text classification based on improved TF-IDF algorithm. In: 2018 IEEE International Conference of Intelligent Robotic and Control Engineering (IRCE), pp. 218–222. IEEE (2018)
Moh, M., Pininti, S., Doddapaneni, S., Moh, T.S.: Detecting web attacks using multi-stage log analysis. In: 2016 IEEE 6th International Conference on Advanced Computing (IACC), pp. 733–738 (2016). https://doi.org/10.1109/IACC.2016.141
OWASP.org: Owasp top ten. https://owasp.org/www-project-top-ten/
Profile, T.G.A.V.: The 10 most common website security attacks and how to protect yourself. https://www.tripwire.com/state-of-security/most-common-website-security-attacks-and-how-to-protect-yourself
Quinlan, J.R.: C4.5: programs for machine learning. In: Proceedings of the 5th Australian Joint Conference on Artificial Intelligence. Lecture Notes in Computer Science, vol. 717, pp. 424–427. Springer, Cham (1993)
Ren, X., Hu, Y., Kuang, W., Souleymanou, M.B.: A web attack detection technology based on bag of words and hidden Markov model. In: 2018 IEEE 15th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), pp. 526–531 (2018). https://doi.org/10.1109/MASS.2018.00081
Riera, T.S., Higuera, J.R.B., Higuera, J.B., Herraiz, J.J.M., Montalvo, J.A.S.: A new multi-label dataset for web attacks CAPEC classification using machine learning techniques. Comput. Secur. 120, 102788 (2022). https://doi.org/10.1016/j.cose.2022.102788, https://www.sciencedirect.com/science/article/pii/S0167404822001833
Seyyar, Y.E., Yavuz, A.G., Ünver, H.M.: Detection of web attacks using the BERT model. In: 2022 30th Signal Processing and Communications Applications Conference (SIU), pp. 1–4 (2022). https://doi.org/10.1109/SIU55565.2022.9864721
Shah, S., Bhatnagar, D.: Feature selection using logistic regression and support vector machine. Int. J. Eng. Res. Appl. 5(10), 29–33 (2015)
Sharma, C., Jain, S.: Analysis and classification of SQL injection vulnerabilities and attacks on web applications. In: 2014 International Conference on Advances in Engineering & Technology Research (ICAETR-2014), pp. 1–6. IEEE (2014)
Sharma, S., Zavarsky, P., Butakov, S.: Machine learning based intrusion detection system for web-based attacks. In: 2020 IEEE 6th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS), pp. 227–230 (2020). https://doi.org/10.1109/BigDataSecurity-HPSC-IDS49724.2020.00048
Conde Camillo da Silva, R., Oliveira Camargo, M.P., Sanches Quessada, M., Claiton Lopes, A., Diassala Monteiro Ernesto, J., Pontara da Costa, K.A.: An intrusion detection system for web-based attacks using IBM Watson. IEEE Latin Am. Trans. 20(2), 191–197 (2022). https://doi.org/10.1109/TLA.2022.9661457
Technologies, P.: Web application attack trends (2020). https://www.ptsecurity.com/ww-en/analytics/web-application-attack-trends-2017/
Tian, J.W., Zhu, H.Y., Li, X., Tian, Z.: Real-time online detection method for web attack based on flow data analysis. In: 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS), pp. 991–994 (2018). https://doi.org/10.1109/ICSESS.2018.8663848
Zhang, Y., Gudmundsson, M., Leiringer, R.: A comparative study of supervised machine learning algorithms for credit scoring purposes. J. Credit Risk 13(1), 1–32 (2017)
Zuech, R., Hancock, J., Khoshgoftaar, T.M.: Detecting web attacks using random undersampling and ensemble learners. J. Big Data 8(1), 1–20 (2021). https://doi.org/10.1186/s40537-021-00460-8
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ramamoorthy, J., Oladimeji, D., Garland, L., Liu, Q. (2023). Detection and Classification of Web Application Attacks. In: Fujita, H., Wang, Y., Xiao, Y., Moonis, A. (eds) Advances and Trends in Artificial Intelligence. Theory and Applications. IEA/AIE 2023. Lecture Notes in Computer Science(), vol 13926. Springer, Cham. https://doi.org/10.1007/978-3-031-36822-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-36822-6_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36821-9
Online ISBN: 978-3-031-36822-6
eBook Packages: Computer ScienceComputer Science (R0)