Skip to main content

A Crawler Guard for Quickly Blocking Unauthorized Web Robot

  • Conference paper
Cyberspace Safety and Security (CSS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 8300))

Included in the following conference series:

  • 2486 Accesses

Abstract

Nowadays Web robots can be used to perform a number of useful navigational goals, such as statistical analysis, link check, and resource collection. On one hand, Web crawler is a particular group of users whose traverse should not make part of regular analysis. Such disturbance affects site decision making in every possible way: marketing campaigns, site re-structuring, site personalization or server balancing, just to name a few. Therefore, it is necessary to correctly detect various robots as soon as possible so as to let the robots to be used under the security policy. In this paper, we come up with a crawler guard to detect and block unauthorized robots under the security policy. It can immediately differentiate various robots based on their functions (navigational goals) to ensure that only the welcome robots which obey the security policy are allowed to view the protected Web pages. Our experiment focuses on how the crawler guard could identify precisely the viewing goal of the robots under certain limits of Web page hits. The experimental results show that the request count is smaller than 8 while the accuracy of detection is 100%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tan, P.-N., Kumar, V.: Discovery of Web robot sessions based on their navigational patterns. Data Mining and Knowledge Discovery 6(1), 9–35 (2002)

    Article  MathSciNet  Google Scholar 

  2. Guo, W., Ju, S., Gu, Y.: Web robot Detection Techniques Based on Statistics of their Requested URL Resources. In: Proceedings of the Ninth International Conference on Computer Supported Cooperative Work in Design, vol. 1, pp. 302–306 (2005)

    Google Scholar 

  3. Dikaiakos, M.D., Stassopoulou, A., Papageorgiou, L.: An Investigation of WWW Crawler behavior: Characterization and Metrics. Computer Communications 28(8), 880–897 (2005)

    Article  Google Scholar 

  4. Bomhardt, C., Gaul, W., Schmidt-Thieme, L.: Web Robot Detection - Preprocessing Web Log files for Robot detection. New Developments in Classification and Data Analysis, 113–124 (2006)

    Google Scholar 

  5. Spider_trap, http://en.wikipedia.org/wiki/Spider_trap

  6. Kadakia, Y.: Automated Attack Prevention, http://www.acunetix.com/vulnerability-scanner/yashkadakia.pdf

  7. Doran, D., Gokhale, S.S.: Discovering New Trends in Web Robot Traffic Through Functional Classification. In: Seventh IEEE International Symposium Network Computing and Applications, pp. 275–278 (2008)

    Google Scholar 

  8. Benedikt, M., Freire, J., Godefroid, P.: VeriWeb: Automatically Testing Dynamic Web Sites. In: Proceedings of the 11th International Conference on the World Wide Web (2002)

    Google Scholar 

  9. Raghavan, S., Garcia-Molina, H.: Crawling the hidden Web. In: Proceedings of the 27th VLDB Conference, pp. 129–138 (2001)

    Google Scholar 

  10. Park, K., Pai, V.S., Lee, K.W., Calo, S.B.: Securing Web Service by Automatic Robot Detection. In: Proceedings of the 2006 USENIX Annual Technical Conference (2006)

    Google Scholar 

  11. Ollmann, G.: Stopping Automated Attack Tools, http://www.ngssoftware.com/papers/

  12. Sun, Y., Councill, I.G., Lee Giles, C.: BotSeer: An automated information system for analyzing Web robots. In: Proceedings of the Eighth International Conference on Web Engineering, pp. 108–114 (2008)

    Google Scholar 

  13. Geens, N., Huysmans, J., Vanthienen, J.: A Probabilistic Reasoning Approach for Discovering Web Crawler Sessions. In: Advances in Data Mining 2013. LNCS, vol. 4065 (2006)

    Google Scholar 

  14. Dikaiakos, M.D., Stassopoulou, A.: Web robot detection: A probabilistic reasoning approach. Computer Networks 53(3), 265–278 (2009)

    Article  Google Scholar 

  15. Dikaiakos, M.D., Stassopoulou, A., Papageorgiou, L.: Characterizing Crawler Behavior from Web Server Access Logs. In: Bauknecht, K., Tjoa, A.M., Quirchmayr, G. (eds.) EC-Web 2003. LNCS, vol. 2738, pp. 369–378. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  16. Kandula, S., Katabi, D., Jacob, M., Berger, A.: Botz-4-sale, Surviving organized ddos attacks that mimic flash crowds. In: Proceedings of the 2nd Symposium on Networked Systems Design and Implementation (2005)

    Google Scholar 

  17. Nakao, K., Inoue, D., Eto, M., Yoshioka, K.: IEICE Transactions on Information and Systems E92-D(5), 787–798 (2009)

    Article  Google Scholar 

  18. Kim, S., Shin, S.-J., Kim, H., Kwon, K.H., Han, Y.: Hybrid Intrusion Forecasting Framework for Early Warning System. IEICE Transactions on Information and Systems E91-D(5), 1234–1241 (2008)

    Article  Google Scholar 

  19. Du, P., Abe, S., Ji, Y., Sato, S., Ishiguro, M.: A Traffic Decomposition and Prediction Method for Detecting and Tracing Network-Wide Anomalies. IEICE Transactions on Information and Systems E92-D(5), 929–936 (2009)

    Article  Google Scholar 

  20. Koster, M.: A method for Web Robots control. Network Working Group - Internet Draft (1996)

    Google Scholar 

  21. Calzarossa, M.C., Massari, L.: Characterization of crawling activities of commercial Web robots. LNEE. Springer (2012)

    Google Scholar 

  22. Kwon, S., Kim, Y.-G., Cha, S.: Web robot detection based on pattern-matching technique. Journal of Information Science 38(2), 118–126 (2012)

    Article  Google Scholar 

  23. Balla, A., Stassopoulou, A., Dikaiakos, M.D.: Real-time Web Crawler Detection. In: 18th International Conference on Telecommunications, pp. 428–432 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Chen, JM. (2013). A Crawler Guard for Quickly Blocking Unauthorized Web Robot. In: Wang, G., Ray, I., Feng, D., Rajarajan, M. (eds) Cyberspace Safety and Security. CSS 2013. Lecture Notes in Computer Science, vol 8300. Springer, Cham. https://doi.org/10.1007/978-3-319-03584-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03584-0_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03583-3

  • Online ISBN: 978-3-319-03584-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics