Abstract
This paper describes part of a web usage mining study executed on log files obtained from a Belgian e-commerce company. From these log files, it can be observed that numerous web robots are active on the site. Most of these robots show a crawling behavior that is radically different from the browsing behavior of human visitors. Because the owners of the e-shop desire information about the paths that human visitors follow through the site, it is of crucial importance to remove these robotic visits from the log files.
Several existing methods for web robot discovery are evaluated and compared, none of them leading to satisfying results. Therefore, a new technique is developed that results in a successful and reliable identification of web robots.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Srivastava, J., Cooley, R., Deshpande, M., Tan, P.N.: Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations 1(2), 12–23 (2000)
Cooley, R.: Web Usage Mining: Discovery and Application of Interesting Patterns from Web Data. Ph.D thesis, University of Minnesota (2000)
Huysmans, J., Baesens, B., Vanthienen, J.: Web usage mining: a practical study. In: Twelfth Conference on Knowledge Acquisition and Management (KAM 2004) (2004)
Perner, P., Fiss, G.: Intelligent e-marketing with web mining, personalization, and user-adapted interfaces. In: Industrial Conference on Data Mining (ICDM 2002), London, UK, pp. 37–52. Springer, Heidelberg (2002)
Blanc, E., Giudici, P.: Sequence rules for web clickstream analysis. In: Industrial Conference on Data Mining (ICDM 2002), London, UK, pp. 1–14. Springer, Heidelberg (2002)
Huysmans, J., Baesens, B., Mues, C., Vanthienen, J.: Web usage mining with time constrained association rules. In: Proceedings of the Sixth International Conference on Enterprise Information Systems (ICEIS 2004), Porto, Portugal, pp. 343–348 (2004)
Heinonen, O., Hatonen, K., Klemettinen, K.: WWW robots and search engines Seminar on Mobile Code, Report TKO-C79, Helsinki University of Technology, Department of Computer Science (1996)
Greenwald, A.R., Kephart, J.O.: Shopbots and pricebots. In: Agent Mediated Electronic Commerce (IJCAI Workshop), pp. 1–23 (1999)
Almeida, V., Menasce, D.A., Riedi, R.H., Peligrinelli, F., Fonseca, R.C., Meira Jr., W.: Analyzing web robots and their impact on caching. In: 6th Web Caching and Content Delivery Workshop, pp. 299–310 (2001)
Tan, P., Kumar, V.: Discovery of web robot sessions based on their navigational patterns. Data Mining and Knowledge Discovery 6, 9–35 (2002)
Koster, M.: The robot exclusion standard (1994), http://www.robotstxt.org/wc/norobots.html
Eichmann, D.: Ethical Web agents. Computer Networks and ISDN Systems 28(1–2), 127–136 (1995)
Koster, M.: The web robots database (2004), http://www.robotstxt.org/wc/active.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Geens, N., Huysmans, J., Vanthienen, J. (2006). Evaluation of Web Robot Discovery Techniques: A Benchmarking Study. In: Perner, P. (eds) Advances in Data Mining. Applications in Medicine, Web Mining, Marketing, Image and Signal Mining. ICDM 2006. Lecture Notes in Computer Science(), vol 4065. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11790853_10
Download citation
DOI: https://doi.org/10.1007/11790853_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36036-0
Online ISBN: 978-3-540-36037-7
eBook Packages: Computer ScienceComputer Science (R0)