Improving Web Application Firewalls with Automatic Language Detection

Nguyen, Tri-Chan-Hung; Le-Nguyen, Minh-Khoi; Le, Dinh-Thuan; Nguyen, Van-Hoa; Tôn, Long-Phuoc; Nguyen-An, Khuong

doi:10.1007/s42979-022-01327-2

Improving Web Application Firewalls with Automatic Language Detection

Original Research
Published: 17 August 2022

Volume 3, article number 446, (2022)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Tri-Chan-Hung Nguyen^1,2^na1,
Minh-Khoi Le-Nguyen^1,2^na1,
Dinh-Thuan Le^1,2,
Van-Hoa Nguyen³,
Long-Phuoc Tôn⁴ &
…
Khuong Nguyen-An ORCID: orcid.org/0000-0002-9910-6387^1,2

342 Accesses
Explore all metrics

Abstract

Cybersecurity has always been a major concern for internet applications and the demand for website protection is on the rise. Nowadays, Web Application Firewalls (WAFs) are commonly used and trusted by web owners, as they are convenient and provide protection against multiple types of attacks by filtering incoming network requests. WAFs are powered by rules written by security experts to halt attackers to penetrate the protected websites. However, these rules have high false-positive rates, which means they often block normal users’ requests, and require constant manual updates, as violating methods are always evolving. A feasible solution to concrete rule-based WAFs is applying machine learning approaches based on observing users’ behavior, but these models are enormous to deploy and time-consuming to run, although WAFs must handle each request in milliseconds. Therefore, we have developed a simple machine learning system to categorize the requests and support traditional WAFs. The module tries to categorize the network requests by their languages and determine whether each incoming request is abnormal (i.e. in a different language than the normal requests). The output of our model is combined with the result of a rule-based WAF (ModSecurity in our implementation) to conclude whether should the incoming request be blocked or not. Our proposed approach, called the machine learning-assisted method, combined the latest programming language categorizer with ModSecurity, a generic open-source WAF, returns good results with almost no false positive and acceptable detective rates in our experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving ModSecurity WAF Using a Structured-Language Classifier

Improving ModSecurity WAF with Machine Learning Methods

An Efficient Machine Learning-Based Web Application Firewall with Deep Automated Pattern Categorization

Notes

Available at https://github.com/SpiderLabs/ModSecurity.
Available at https://code.visualstudio.com/updates/v1_60#_automatic-language-detection.
Available at https://www.isi.csic.es/dataset.
Available at http://www.lirmm.fr/pkdd2007-challenge.
Available at https://www.omg.org/spec/ASTM/1.0.
Available at https://github.com/yoeo/guesslang.

References

Betarte G, Giménez E, Martinez R, Pardo Á. Improving web application firewalls through anomaly detection. In: 17th IEEE International Conference on machine learning and applications, ICMLA, 2018; pp. 779–784. IEEE, Orlando, FL, USA. https://doi.org/10.1109/ICMLA.2018.00124.
Nguyen T, Le-Nguyen M, Le D, Nguyen V, Tôn L, Nguyen-An K. Improving modsecurity WAF using a structured-language classifier. In: Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. Communications in Computer and Information Science, 2021; vol. 1500, pp. 89–104. Springer, Ho Chi Minh City, Vietnam. https://doi.org/10.1007/978-981-16-8062-5_6.
Boukhtouta A, Lakhdari N, Mokhov S.A, Debbabi M. Towards fingerprinting malicious traffic. In: Proceedings of the 4th International Conference on ambient systems, networks and technologies—ANT, the 3rd International Conference on sustainable energy information technology (SEIT). Procedia Computer Science, 2013; vol. 19, pp. 548–555. Elsevier, Halifax, Nova Scotia, Canada. https://doi.org/10.1016/j.procs.2013.06.073.
Gao M, Ma L, Liu H, Zhang Z, Ning Z, Xu J. Malicious network traffic detection based on deep neural networks and association analysis. Sensors. 2020;20(5):1452. https://doi.org/10.3390/s20051452.
Article Google Scholar
Shinomiya K, Goto S. Detecting malicious traffic through two-phase machine learning. Proc Asia-Pac Adv Netw. 2015;40:34. https://doi.org/10.7125/40.6.
Article Google Scholar
Radford BJ, Apolonio LM, Trias AJ, Simpson JA. Network traffic anomaly detection using recurrent neural networks. 2018. arXiv:1803.10769 [CoRR].
Marin G, Casas P, Capdehourat G. Deepmal—deep learning models for malware traffic detection and classification. 2020. arXiv:2003.04079 [CoRR].
Hwang R-H, Peng M-C, Nguyen V-L, Chang Y-L. An lstm-based deep learning approach for classifying malicious traffic at the packet level. Appl Sci. 2019;9:3414. https://doi.org/10.3390/app9163414.
Article Google Scholar
Mikolov T, Yih W, Zweig G. Linguistic regularities in continuous space word representations. In: Human language technologies: Proceedings of the Conference of the North American Chapter, 2013; pp. 746–751. The Association for Computational Linguistics, ???.
Zhang M, Xu B, Bai S, Lu S, Lin Z. A deep learning method to detect web attacks using a specially designed CNN. In: Neural Information Processing—24th International Conference, ICONIP, Part V. Lecture Notes in Computer Science, 2017; vol. 10638, pp. 828–836. Springer, Guangzhou, China. https://doi.org/10.1007/978-3-319-70139-4_84.
Tran N, Nguyen V, Nguyen-Le T, Nguyen-An K. Improving modsecurity WAF with machine learning methods. In: Proceedings of Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications - 7th International Conference, FDSE. Communications in Computer and Information Science, 2020; vol. 1306, pp. 93–107. Springer, Quy Nhon, Vietnam. https://doi.org/10.1007/978-981-33-4370-2_7.
Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. In: Proceedings of the Conference on empirical methods in natural language processing, 2014; pp. 1532–1543. ACL, Doha, Qatar. https://doi.org/10.3115/v1/d14-1162.
Kingma DP, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on learning representations—ICLR 2015; pp. 1–15.

Download references

Acknowledgements

We acknowledge the support of time and facilities from Ho Chi Minh City University of Technology (HCMUT), VNU-HCM, for this study.

Author information

Tri-Chan-Hung Nguyen and Minh-Khoi Le-Nguyen have contributed equally to this work.

Authors and Affiliations

Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology (HCMUT), 268 Ly Thuong Kiet Street, District 10, Ho Chi Minh City, Vietnam
Tri-Chan-Hung Nguyen, Minh-Khoi Le-Nguyen, Dinh-Thuan Le & Khuong Nguyen-An
Vietnam National University Ho Chi Minh City, Linh Trung Ward, Thu Duc District, Ho Chi Minh City, Vietnam
Tri-Chan-Hung Nguyen, Minh-Khoi Le-Nguyen, Dinh-Thuan Le & Khuong Nguyen-An
Polaris Infosec Pte. Ltd, 384 Hoang Dieu Street, District 4, Ho Chi Minh City, Vietnam
Van-Hoa Nguyen
Faculty of Information Technology, Industrial University, 12 Nguyen Van Bao Street, Go Vap District, Ho Chi Minh City, Vietnam
Long-Phuoc Tôn

Authors

Tri-Chan-Hung Nguyen
View author publications
You can also search for this author inPubMed Google Scholar
Minh-Khoi Le-Nguyen
View author publications
You can also search for this author inPubMed Google Scholar
Dinh-Thuan Le
View author publications
You can also search for this author inPubMed Google Scholar
Van-Hoa Nguyen
View author publications
You can also search for this author inPubMed Google Scholar
Long-Phuoc Tôn
View author publications
You can also search for this author inPubMed Google Scholar
Khuong Nguyen-An
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Khuong Nguyen-An.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Future Data and Security Engineering 2021" guest edited by Tran Khanh Dang.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Nguyen, TCH., Le-Nguyen, MK., Le, DT. et al. Improving Web Application Firewalls with Automatic Language Detection. SN COMPUT. SCI. 3, 446 (2022). https://doi.org/10.1007/s42979-022-01327-2

Download citation

Received: 15 April 2022
Accepted: 20 June 2022
Published: 17 August 2022
DOI: https://doi.org/10.1007/s42979-022-01327-2

Keywords

Part of a collection:

Future Data and Security Engineering 2021

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving Web Application Firewalls with Automatic Language Detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improving ModSecurity WAF Using a Structured-Language Classifier

Improving ModSecurity WAF with Machine Learning Methods

An Efficient Machine Learning-Based Web Application Firewall with Deep Automated Pattern Categorization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now