Abstract
Web bot generates a large fraction of traffic on present-day Web servers. It not only introduces a threat to website security, performance and user privacy but also raises concerns about valuable information and digital asset scripting. Much research explored traffic features, tagging legitimate users and bot traffic, and created some efficient machine-learning models to detect web bots. However, previous machine learning methods used to detect web bots based on the observable raw data, that have become more challenging with the increasingly diverse and complex logic and technologies of web bots. In this research, we proposed the Autoencoder-based method to detect the web bot, distinguishing the HTTP access behaviours between humans and web bots. Our method aims to find the hidden features from the raw HTTP access data and allow for clustering the web bots with scattered raw features. Furthermore, we use the polar coordinates transformation strategy to rotate the geometry of hidden features and solve the clustering difficulties caused by the randomness of the neural network environment. We compare the web bot detection performance with the other competitors, which yielded about 30% improvements in accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Geroimenko, V.: Dictionary of XML Technologies and the Semantic Web, vol. 1. Springer, Cham (2004), https://doi.org/10.1007/978-0-85729-376-3
Menczer, F., Pant, G., Srinivasan, P., Ruiz, M.E.: Evaluating topic-driven web crawlers. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 241–249 (2001)
Shemshadi, A., Sheng, Q.Z., Qin, Y.: ThingSeek: a crawler and search engine for the internet of things. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1149–1152 (2016)
Li, X., Azad, B.A., Rahmati, A., Nikiforakis, N.: Good bot, bad bot: characterizing automated browsing activity. In: 2021 IEEE Symposium on Security and Privacy (sp), pp. 1589–1605. IEEE (2021)
Nagaraja, S., Shah, R.: Clicktok: click fraud detection using traffic analysis. In: Proceedings of the 12th Conference on Security and Privacy in Wireless and Mobile Networks, pp. 105–116 (2019)
Wang, X., Gu, B., Qu, Y., Ren, Y., Xiang, Y., Gao, L.: Reliable customized privacy-preserving in fog computing. In: ICC 2020–2020 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE (2020)
Imperva. 2022 imperva bad bot report (2018). https://www.imperva.com/resources/reports/2022-Imperva-Bad-Bot-Report.pdf
Basso, A., Bergadano, F.: Anti-bot strategies based on human interactive proofs. In: Stavroulakis, P., Stamp, M. (eds.) Handbook of Information and Communication Security, pp. 273–291. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-04117-4_15
Basso, A.: Protecting web resources from massive automated access. University of Torino, Technical RT114/08 (2008)
von Ahn, L., Blum, M., Hopper, N.J., Langford, J.: CAPTCHA: using hard AI problems for security. In: Biham, E. (ed.) EUROCRYPT 2003. LNCS, vol. 2656, pp. 294–311. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-39200-9_18
Jonker, H., Krumnow, B., Vlot, G.: Fingerprint surface-based detection of web bot detectors. In: Sako, K., Schneider, S., Ryan, P.Y.A. (eds.) ESORICS 2019. LNCS, vol. 11736, pp. 586–605. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29962-0_28
Rovetta, S., Suchacka, G., Masulli, F.: Bot recognition in a web store: an approach based on unsupervised learning. J. Netw. Comput. Appl. 157, 102577 (2020)
Suchacka, G., Cabri, A., Rovetta, S., Masulli, F.: Efficient on-the-fly web bot detection. Knowl. Based Syst. 223, 107074 (2021)
Rocha, E.: 2018 bad bot report: the year bad bots went mainstream (2018). https://www.globaldots.com/resources/blog/2018-bad-bot-report-the-year-bad-bots-went-mainstream/
Wang, W., Huang, Y., Wang, Y., Wang, L.: Generalized autoencoder: a neural network framework for dimensionality reduction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 490–497 (2014)
Yuan, X., Li, C., Li, X.: DeepDefense: identifying DDoS attack via deep learning. In: 2017 IEEE International Conference on Smart Computing (SMARTCOMP), pp. 1–8. IEEE (2017)
Cabri, A., Suchacka, G., Rovetta, S., Masulli, F.: Online web bot detection using a sequential classification approach. In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1536–1540. IEEE (2018)
Luo, Y., She, G., Cheng, P., Xiong, Y.: BotGraph: web bot detection based on sitemap. arXiv preprint arXiv:1903.08074 (2019)
Acarali, D., Rajarajan, M., Komninos, N., Herwono, I.: Survey of approaches and features for the identification of http-based botnet traffic. J. Netw. Comput. Appl. 76, 1–15 (2016)
Chavoshi, N., Hamooni, H., Mueen, A.: Temporal patterns in bot activities. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 1601–1606 (2017)
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)
Danielsson, P.-E.: Euclidean distance mapping. Comput. Graph. Image Process. 14(3), 227–248 (1980)
Murtagh, F., Legendre, P.: Ward’s hierarchical agglomerative clustering method: which algorithms implement ward’s criterion? J. Classif. 31(3), 274–295 (2014)
Doran, D., Gokhale, S.S.: An integrated method for real time and offline web robot detection. Expert Syst. 33(6), 592–606 (2016)
Rovetta, S., Cabri, A., Masulli, F., Suchacka, G.: Bot or not? a case study on bot recognition from web session logs. In: Esposito, A., Faundez-Zanuy, M., Morabito, F.C., Pasero, E. (eds.) WIRN 2017 2017. SIST, vol. 103, pp. 197–206. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-95095-2_19
Zabihimayvan, M., Sadeghi, R., Rude, H.N., Doran, D.: A soft computing approach for benign and malicious web robot detection. Expert Syst. Appl. 87, 129–140 (2017)
Berners-Lee, T., Fielding, R., Frystyk, H.: Hypertext transfer protocol-http/1.0. Technical report (1996)
KR Suneetha and Raghuraman Krishnamoorthi: Identifying user behavior by analyzing web server access log file. IJCSNS Int. J. Comput. Sci. Netw. Secur. 9(4), 327–332 (2009)
Yadav, J., Sharma, M.: A review of k-mean algorithm. Int. J. Eng. Trends Technol. 4(7), 2972–2976 (2013)
Chowdhary, C.L., Acharjya, D. P.: Clustering algorithm in possibilistic exponential fuzzy C-mean segmenting medical images. In: Journal of Biomimetics, Biomaterials and Biomedical Engineering, vol. 30, pp. 12–23. Trans Tech Publications Ltd (2017)
Derpanis, K.G.: Mean shift clustering. Lect. Notes 32, 1–4 (2005)
Acknowledgement
This research was supported by Webjet Limited, the company has provided valuable raw data to the research, those data are first-hand and were collected in the year of this research. It has made a contribution to this research and would be meaningful to the community.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Li, K., Xiang, M., Kakaiya, M., Kaul, S., Wang, X. (2023). Web Bot Detection Based on Hidden Features of HTTP Access Log. In: Yu, S., Gu, B., Qu, Y., Wang, X. (eds) Tools for Design, Implementation and Verification of Emerging Information Technologies. TridentCom 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 489. Springer, Cham. https://doi.org/10.1007/978-3-031-33458-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-33458-0_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33457-3
Online ISBN: 978-3-031-33458-0
eBook Packages: Computer ScienceComputer Science (R0)