Web Bot Detection Based on Hidden Features of HTTP Access Log

Li, Kaiyuan; Xiang, Mingrong; Kakaiya, Mitalkumar; Kaul, Shashank; Wang, Xiaodong

doi:10.1007/978-3-031-33458-0_3

Kaiyuan Li¹⁹,
Mingrong Xiang²⁰,
Mitalkumar Kakaiya¹⁹,
Shashank Kaul¹⁹ &
…
Xiaodong Wang²¹

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 489))

Included in the following conference series:

International Conference on Testbeds and Research Infrastructures

130 Accesses

Abstract

Web bot generates a large fraction of traffic on present-day Web servers. It not only introduces a threat to website security, performance and user privacy but also raises concerns about valuable information and digital asset scripting. Much research explored traffic features, tagging legitimate users and bot traffic, and created some efficient machine-learning models to detect web bots. However, previous machine learning methods used to detect web bots based on the observable raw data, that have become more challenging with the increasingly diverse and complex logic and technologies of web bots. In this research, we proposed the Autoencoder-based method to detect the web bot, distinguishing the HTTP access behaviours between humans and web bots. Our method aims to find the hidden features from the raw HTTP access data and allow for clustering the web bots with scattered raw features. Furthermore, we use the polar coordinates transformation strategy to rotate the geometry of hidden features and solve the clustering difficulties caused by the randomness of the neural network environment. We compare the web bot detection performance with the other competitors, which yielded about 30% improvements in accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Geroimenko, V.: Dictionary of XML Technologies and the Semantic Web, vol. 1. Springer, Cham (2004), https://doi.org/10.1007/978-0-85729-376-3
Menczer, F., Pant, G., Srinivasan, P., Ruiz, M.E.: Evaluating topic-driven web crawlers. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 241–249 (2001)
Google Scholar
Shemshadi, A., Sheng, Q.Z., Qin, Y.: ThingSeek: a crawler and search engine for the internet of things. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1149–1152 (2016)
Google Scholar
Li, X., Azad, B.A., Rahmati, A., Nikiforakis, N.: Good bot, bad bot: characterizing automated browsing activity. In: 2021 IEEE Symposium on Security and Privacy (sp), pp. 1589–1605. IEEE (2021)
Google Scholar
Nagaraja, S., Shah, R.: Clicktok: click fraud detection using traffic analysis. In: Proceedings of the 12th Conference on Security and Privacy in Wireless and Mobile Networks, pp. 105–116 (2019)
Google Scholar
Wang, X., Gu, B., Qu, Y., Ren, Y., Xiang, Y., Gao, L.: Reliable customized privacy-preserving in fog computing. In: ICC 2020–2020 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE (2020)
Google Scholar
Imperva. 2022 imperva bad bot report (2018). https://www.imperva.com/resources/reports/2022-Imperva-Bad-Bot-Report.pdf
Basso, A., Bergadano, F.: Anti-bot strategies based on human interactive proofs. In: Stavroulakis, P., Stamp, M. (eds.) Handbook of Information and Communication Security, pp. 273–291. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-04117-4_15
Basso, A.: Protecting web resources from massive automated access. University of Torino, Technical RT114/08 (2008)
Google Scholar
von Ahn, L., Blum, M., Hopper, N.J., Langford, J.: CAPTCHA: using hard AI problems for security. In: Biham, E. (ed.) EUROCRYPT 2003. LNCS, vol. 2656, pp. 294–311. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-39200-9_18
Chapter Google Scholar
Jonker, H., Krumnow, B., Vlot, G.: Fingerprint surface-based detection of web bot detectors. In: Sako, K., Schneider, S., Ryan, P.Y.A. (eds.) ESORICS 2019. LNCS, vol. 11736, pp. 586–605. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29962-0_28
Chapter Google Scholar
Rovetta, S., Suchacka, G., Masulli, F.: Bot recognition in a web store: an approach based on unsupervised learning. J. Netw. Comput. Appl. 157, 102577 (2020)
Article Google Scholar
Suchacka, G., Cabri, A., Rovetta, S., Masulli, F.: Efficient on-the-fly web bot detection. Knowl. Based Syst. 223, 107074 (2021)
Article Google Scholar
Rocha, E.: 2018 bad bot report: the year bad bots went mainstream (2018). https://www.globaldots.com/resources/blog/2018-bad-bot-report-the-year-bad-bots-went-mainstream/
Wang, W., Huang, Y., Wang, Y., Wang, L.: Generalized autoencoder: a neural network framework for dimensionality reduction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 490–497 (2014)
Google Scholar
Yuan, X., Li, C., Li, X.: DeepDefense: identifying DDoS attack via deep learning. In: 2017 IEEE International Conference on Smart Computing (SMARTCOMP), pp. 1–8. IEEE (2017)
Google Scholar
Cabri, A., Suchacka, G., Rovetta, S., Masulli, F.: Online web bot detection using a sequential classification approach. In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1536–1540. IEEE (2018)
Google Scholar
Luo, Y., She, G., Cheng, P., Xiong, Y.: BotGraph: web bot detection based on sitemap. arXiv preprint arXiv:1903.08074 (2019)
Acarali, D., Rajarajan, M., Komninos, N., Herwono, I.: Survey of approaches and features for the identification of http-based botnet traffic. J. Netw. Comput. Appl. 76, 1–15 (2016)
Article Google Scholar
Chavoshi, N., Hamooni, H., Mueen, A.: Temporal patterns in bot activities. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 1601–1606 (2017)
Google Scholar
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)
Danielsson, P.-E.: Euclidean distance mapping. Comput. Graph. Image Process. 14(3), 227–248 (1980)
Article Google Scholar
Murtagh, F., Legendre, P.: Ward’s hierarchical agglomerative clustering method: which algorithms implement ward’s criterion? J. Classif. 31(3), 274–295 (2014)
Article MathSciNet MATH Google Scholar
Doran, D., Gokhale, S.S.: An integrated method for real time and offline web robot detection. Expert Syst. 33(6), 592–606 (2016)
Article Google Scholar
Rovetta, S., Cabri, A., Masulli, F., Suchacka, G.: Bot or not? a case study on bot recognition from web session logs. In: Esposito, A., Faundez-Zanuy, M., Morabito, F.C., Pasero, E. (eds.) WIRN 2017 2017. SIST, vol. 103, pp. 197–206. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-95095-2_19
Chapter Google Scholar
Zabihimayvan, M., Sadeghi, R., Rude, H.N., Doran, D.: A soft computing approach for benign and malicious web robot detection. Expert Syst. Appl. 87, 129–140 (2017)
Article Google Scholar
Berners-Lee, T., Fielding, R., Frystyk, H.: Hypertext transfer protocol-http/1.0. Technical report (1996)
Google Scholar
KR Suneetha and Raghuraman Krishnamoorthi: Identifying user behavior by analyzing web server access log file. IJCSNS Int. J. Comput. Sci. Netw. Secur. 9(4), 327–332 (2009)
Google Scholar
Yadav, J., Sharma, M.: A review of k-mean algorithm. Int. J. Eng. Trends Technol. 4(7), 2972–2976 (2013)
Google Scholar
Chowdhary, C.L., Acharjya, D. P.: Clustering algorithm in possibilistic exponential fuzzy C-mean segmenting medical images. In: Journal of Biomimetics, Biomaterials and Biomedical Engineering, vol. 30, pp. 12–23. Trans Tech Publications Ltd (2017)
Google Scholar
Derpanis, K.G.: Mean shift clustering. Lect. Notes 32, 1–4 (2005)
Google Scholar

Download references

Acknowledgement

This research was supported by Webjet Limited, the company has provided valuable raw data to the research, those data are first-hand and were collected in the year of this research. It has made a contribution to this research and would be meaningful to the community.

Author information

Authors and Affiliations

Webjet Limited, Melbourne, Australia
Kaiyuan Li, Mitalkumar Kakaiya & Shashank Kaul
Deakin University, Geelong, Australia
Mingrong Xiang
Victoria University, Melbourne, Australia
Xiaodong Wang

Authors

Kaiyuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Mingrong Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Mitalkumar Kakaiya
View author publications
You can also search for this author in PubMed Google Scholar
Shashank Kaul
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingrong Xiang .

Editor information

Editors and Affiliations

University of Technology Sydney, Sydney, NSW, Australia
Shui Yu
National Supercomputer Center, Jinan, China
Bruce Gu
CSIRO Data61, Sydney, NSW, Australia
Youyang Qu
Melbourne Polytechnic, Melbourne, VIC, Australia
Xiaodong Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, K., Xiang, M., Kakaiya, M., Kaul, S., Wang, X. (2023). Web Bot Detection Based on Hidden Features of HTTP Access Log. In: Yu, S., Gu, B., Qu, Y., Wang, X. (eds) Tools for Design, Implementation and Verification of Emerging Information Technologies. TridentCom 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 489. Springer, Cham. https://doi.org/10.1007/978-3-031-33458-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-33458-0_3
Published: 17 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33457-3
Online ISBN: 978-3-031-33458-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Web Bot Detection Based on Hidden Features of HTTP Access Log