Abstract
In this study, we introduce two novel features: the consecutive sequential request ratio and standard deviation of page request depth, for improving the accuracy of malicious and non-malicious web crawler classification from static web server access logs with traditional data mining classifiers. In the first experiment we evaluate the new features on the classification of known well-behaved web crawlers and human visitors. In the second experiment we evaluate the new features on the classification of malicious web crawlers, unknown visitors, well-behaved crawlers and human visitors. The classification performance is evaluated in terms of classification accuracy, and F1 score. The experimental results demonstrate the potential of the two new features to improve the accuracy of data mining classifiers in identifying malicious and well-behaved web crawler sessions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wilson, C.: Botnets, Cybercrime, and Cyberterrorism: Vulnerabilities and Policy Issues for Congress. Foreign Affairs, Defense, and Trade Division, United States Governemnt (2008)
WEKA (December 2010), http://www.cs.waikato.ac.nz/ml/weka/
Tan, P.N., Kumar, V.: Patterns, Discovery of Web Robot Sessions Based on their Navigation. Data Mining and Knowledge Discovery 6, 9–35 (2002)
Bomhardt, C., Gaul, W., Schmidt-Thieme, L.: Web Robot Detection - Preprocessing Web Logfiles for Robot Detection. In: Proc. SISCLADAG, Bologna, Italy (2005)
Stassopoulou, A., Dikaiakos, M.D.: Web robot detection: A probabilistic reasoning approach. Computer Networks: The International Journal of Computer and Telecommunications Networking 53, 265–278 (2009)
Doran, D., Gokhale, S.S.: Web robot detection techniques: overview and limitations. Data Mining and Knowledge Discovery, 1–28 (June 2010)
User-Agents.org (January 2011), http://www.user-agents.org
Bots vs. Browsers (January 2011), http://www.botsvsbrowsers.com/
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)
Cohen, W.W.: Fast effective rule induction. In: ICML 1995, pp. 115–123 (1995)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, San Francisco (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stevanovic, D., An, A., Vlajic, N. (2011). Detecting Web Crawlers from Web Server Access Logs with Data Mining Classifiers. In: Kryszkiewicz, M., Rybinski, H., Skowron, A., RaÅ›, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2011. Lecture Notes in Computer Science(), vol 6804. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21916-0_52
Download citation
DOI: https://doi.org/10.1007/978-3-642-21916-0_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21915-3
Online ISBN: 978-3-642-21916-0
eBook Packages: Computer ScienceComputer Science (R0)