Skip to main content

Web Bot Detection Based on Hidden Features of HTTP Access Log

  • Conference paper
  • First Online:
Tools for Design, Implementation and Verification of Emerging Information Technologies (TridentCom 2022)

Abstract

Web bot generates a large fraction of traffic on present-day Web servers. It not only introduces a threat to website security, performance and user privacy but also raises concerns about valuable information and digital asset scripting. Much research explored traffic features, tagging legitimate users and bot traffic, and created some efficient machine-learning models to detect web bots. However, previous machine learning methods used to detect web bots based on the observable raw data, that have become more challenging with the increasingly diverse and complex logic and technologies of web bots. In this research, we proposed the Autoencoder-based method to detect the web bot, distinguishing the HTTP access behaviours between humans and web bots. Our method aims to find the hidden features from the raw HTTP access data and allow for clustering the web bots with scattered raw features. Furthermore, we use the polar coordinates transformation strategy to rotate the geometry of hidden features and solve the clustering difficulties caused by the randomness of the neural network environment. We compare the web bot detection performance with the other competitors, which yielded about 30% improvements in accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Geroimenko, V.: Dictionary of XML Technologies and the Semantic Web, vol. 1. Springer, Cham (2004), https://doi.org/10.1007/978-0-85729-376-3

  2. Menczer, F., Pant, G., Srinivasan, P., Ruiz, M.E.: Evaluating topic-driven web crawlers. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 241–249 (2001)

    Google Scholar 

  3. Shemshadi, A., Sheng, Q.Z., Qin, Y.: ThingSeek: a crawler and search engine for the internet of things. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1149–1152 (2016)

    Google Scholar 

  4. Li, X., Azad, B.A., Rahmati, A., Nikiforakis, N.: Good bot, bad bot: characterizing automated browsing activity. In: 2021 IEEE Symposium on Security and Privacy (sp), pp. 1589–1605. IEEE (2021)

    Google Scholar 

  5. Nagaraja, S., Shah, R.: Clicktok: click fraud detection using traffic analysis. In: Proceedings of the 12th Conference on Security and Privacy in Wireless and Mobile Networks, pp. 105–116 (2019)

    Google Scholar 

  6. Wang, X., Gu, B., Qu, Y., Ren, Y., Xiang, Y., Gao, L.: Reliable customized privacy-preserving in fog computing. In: ICC 2020–2020 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE (2020)

    Google Scholar 

  7. Imperva. 2022 imperva bad bot report (2018). https://www.imperva.com/resources/reports/2022-Imperva-Bad-Bot-Report.pdf

  8. Basso, A., Bergadano, F.: Anti-bot strategies based on human interactive proofs. In: Stavroulakis, P., Stamp, M. (eds.) Handbook of Information and Communication Security, pp. 273–291. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-04117-4_15

  9. Basso, A.: Protecting web resources from massive automated access. University of Torino, Technical RT114/08 (2008)

    Google Scholar 

  10. von Ahn, L., Blum, M., Hopper, N.J., Langford, J.: CAPTCHA: using hard AI problems for security. In: Biham, E. (ed.) EUROCRYPT 2003. LNCS, vol. 2656, pp. 294–311. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-39200-9_18

    Chapter  Google Scholar 

  11. Jonker, H., Krumnow, B., Vlot, G.: Fingerprint surface-based detection of web bot detectors. In: Sako, K., Schneider, S., Ryan, P.Y.A. (eds.) ESORICS 2019. LNCS, vol. 11736, pp. 586–605. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29962-0_28

    Chapter  Google Scholar 

  12. Rovetta, S., Suchacka, G., Masulli, F.: Bot recognition in a web store: an approach based on unsupervised learning. J. Netw. Comput. Appl. 157, 102577 (2020)

    Article  Google Scholar 

  13. Suchacka, G., Cabri, A., Rovetta, S., Masulli, F.: Efficient on-the-fly web bot detection. Knowl. Based Syst. 223, 107074 (2021)

    Article  Google Scholar 

  14. Rocha, E.: 2018 bad bot report: the year bad bots went mainstream (2018). https://www.globaldots.com/resources/blog/2018-bad-bot-report-the-year-bad-bots-went-mainstream/

  15. Wang, W., Huang, Y., Wang, Y., Wang, L.: Generalized autoencoder: a neural network framework for dimensionality reduction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 490–497 (2014)

    Google Scholar 

  16. Yuan, X., Li, C., Li, X.: DeepDefense: identifying DDoS attack via deep learning. In: 2017 IEEE International Conference on Smart Computing (SMARTCOMP), pp. 1–8. IEEE (2017)

    Google Scholar 

  17. Cabri, A., Suchacka, G., Rovetta, S., Masulli, F.: Online web bot detection using a sequential classification approach. In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1536–1540. IEEE (2018)

    Google Scholar 

  18. Luo, Y., She, G., Cheng, P., Xiong, Y.: BotGraph: web bot detection based on sitemap. arXiv preprint arXiv:1903.08074 (2019)

  19. Acarali, D., Rajarajan, M., Komninos, N., Herwono, I.: Survey of approaches and features for the identification of http-based botnet traffic. J. Netw. Comput. Appl. 76, 1–15 (2016)

    Article  Google Scholar 

  20. Chavoshi, N., Hamooni, H., Mueen, A.: Temporal patterns in bot activities. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 1601–1606 (2017)

    Google Scholar 

  21. McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)

  22. Danielsson, P.-E.: Euclidean distance mapping. Comput. Graph. Image Process. 14(3), 227–248 (1980)

    Article  Google Scholar 

  23. Murtagh, F., Legendre, P.: Ward’s hierarchical agglomerative clustering method: which algorithms implement ward’s criterion? J. Classif. 31(3), 274–295 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  24. Doran, D., Gokhale, S.S.: An integrated method for real time and offline web robot detection. Expert Syst. 33(6), 592–606 (2016)

    Article  Google Scholar 

  25. Rovetta, S., Cabri, A., Masulli, F., Suchacka, G.: Bot or not? a case study on bot recognition from web session logs. In: Esposito, A., Faundez-Zanuy, M., Morabito, F.C., Pasero, E. (eds.) WIRN 2017 2017. SIST, vol. 103, pp. 197–206. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-95095-2_19

    Chapter  Google Scholar 

  26. Zabihimayvan, M., Sadeghi, R., Rude, H.N., Doran, D.: A soft computing approach for benign and malicious web robot detection. Expert Syst. Appl. 87, 129–140 (2017)

    Article  Google Scholar 

  27. Berners-Lee, T., Fielding, R., Frystyk, H.: Hypertext transfer protocol-http/1.0. Technical report (1996)

    Google Scholar 

  28. KR Suneetha and Raghuraman Krishnamoorthi: Identifying user behavior by analyzing web server access log file. IJCSNS Int. J. Comput. Sci. Netw. Secur. 9(4), 327–332 (2009)

    Google Scholar 

  29. Yadav, J., Sharma, M.: A review of k-mean algorithm. Int. J. Eng. Trends Technol. 4(7), 2972–2976 (2013)

    Google Scholar 

  30. Chowdhary, C.L., Acharjya, D. P.: Clustering algorithm in possibilistic exponential fuzzy C-mean segmenting medical images. In: Journal of Biomimetics, Biomaterials and Biomedical Engineering, vol. 30, pp. 12–23. Trans Tech Publications Ltd (2017)

    Google Scholar 

  31. Derpanis, K.G.: Mean shift clustering. Lect. Notes 32, 1–4 (2005)

    Google Scholar 

Download references

Acknowledgement

This research was supported by Webjet Limited, the company has provided valuable raw data to the research, those data are first-hand and were collected in the year of this research. It has made a contribution to this research and would be meaningful to the community.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingrong Xiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, K., Xiang, M., Kakaiya, M., Kaul, S., Wang, X. (2023). Web Bot Detection Based on Hidden Features of HTTP Access Log. In: Yu, S., Gu, B., Qu, Y., Wang, X. (eds) Tools for Design, Implementation and Verification of Emerging Information Technologies. TridentCom 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 489. Springer, Cham. https://doi.org/10.1007/978-3-031-33458-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-33458-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-33457-3

  • Online ISBN: 978-3-031-33458-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics