Skip to main content

A Soft Computing Prefetcher to Mitigate Cache Degradation by Web Robots

  • Conference paper
  • First Online:
Advances in Neural Networks - ISNN 2017 (ISNN 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10261))

Included in the following conference series:

Abstract

This paper investigates the feasibility of a resource prefetcher able to predict future requests made by web robots, which are software programs rapidly overtaking human users as the dominant source of web server traffic. Such a prefetcher is a crucial first line of defense for web caches and content management systems that must service many requests while maintaining good performance. Our prefetcher marries a deep recurrent neural network with a Bayesian network to combine prior global data with local data about specific robots. Experiments with traffic logs from web servers across two universities demonstrate improved predictions over a traditional dependency graph approach. Finally, preliminary evaluation of a hypothetical caching system that incorporates our prefetching scheme is discussed.

N. Xie and K. Brown are joint first authors of this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    www.botsvsbrowsers.com.

  2. 2.

    The RNN is labeled LSTM in the figures.

References

  1. Almeida, V., Menascé, D., Riedi, R., Peligrinelli, F., Fonseca, R., Meira Jr., W.: Analyzing web robots and their impact on caching. In: Proceedings of Sixth Workshop on Web Caching and Content Distribution, pp. 20–22 (2001)

    Google Scholar 

  2. Brandman, O., Cho, J., Garcia-Molina, H., Shivakumar, S.: Crawler-friendly web servers. In: Proceedings of Performance and Architecture of Web Servers Conference (2000)

    Google Scholar 

  3. Chen, X., Zhang, X.: A popularity-based prediction model for web prefetching. Computer 36(3), 63–70 (2003)

    Article  Google Scholar 

  4. Dietz, L.: Directed factor graph notation for generative models. Technical report, Max Planck Institute for Informatics (2010)

    Google Scholar 

  5. Doran, D., Gokhale, S.: A classification framework for web robots. J. Am. Soc. Inf. Sci. Technol. 63, 2549–2554 (2012)

    Article  Google Scholar 

  6. Doran, D., Gokhale, S.S.: Web robot detection techniques: overview and limitations. Data Mining Knowl. Discov. 22(1–2), 183–210 (2011)

    Article  Google Scholar 

  7. Doran, D., Morillo, K., Gokhale, S.: A comparison of web robot and human requests. In: Proceedings of ACM/IEEE Conference on Advances in Social Network Analysis and Mining, pp. 1374–1380 (2013)

    Google Scholar 

  8. Gellert, A., Florea, A.: Web prefetching through efficient prediction by partial matching. World Wide Web 19(5), 921–932 (2016)

    Article  Google Scholar 

  9. Graves, A.: Neural networks. In: Graves, A. (ed.) Supervised Sequence Labelling with Recurrent Neural Networks, pp. 15–35. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  10. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  11. Lee, J., Cha, S., Lee, D., Lee, H.: Classification of web robots: an empirical study based on over one billion requests. Comput. Secur. 28(8), 795–802 (2009)

    Article  Google Scholar 

  12. Li, H., Lee, W.-C., Sivasubramaniam, A., Giles, C.L.: A hybrid cache and prefetch mechanism for scientific literature search engines. In: Baresi, L., Fraternali, P., Houben, G.-J. (eds.) ICWE 2007. LNCS, vol. 4607, pp. 121–136. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73597-7_10

    Chapter  Google Scholar 

  13. Menascé, D., Almeida, V., Riedi, R., Ribeiro, F., Fonseca, R., Meira Jr., W.: In search of invariants for e-business workloads. In: Proceedings of the 2nd ACM Conference on Electronic Commerce, pp. 56–65 (2000)

    Google Scholar 

  14. Pallis, G., Vakali, A., Pokorny, J.: A clustering-based prefetching scheme on a web cache environment. Comput. Electr. Eng. 34(4), 309–323 (2008)

    Article  MATH  Google Scholar 

  15. Qualman, E.: Socialnomics: How Social Media Transforms the Way We Live and Do Business. Wiley, Hoboken (2012)

    Google Scholar 

  16. Rude, H.N., Doran, D.: Request type prediction for web robot and internet of things traffic. In: Proceedings of IEEE International Conference on Machine Learning and Applications, pp. 995–1000 (2015)

    Google Scholar 

  17. Zeifman, I.: Report: Bot traffic is up to 61.5% of all website traffic. bit.ly/MoMRxE

Download references

Acknowledgment

The authors thank Logan Rickert for data processing support, Maria-Carla Calzarossa for data from the University of Pavia, and Mark Anderson for data from Wright State University. This paper is based on work supported by the National Science Foundation (NSF) under Grant No. 1464104. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Derek Doran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Xie, N., Brown, K., Rude, N., Doran, D. (2017). A Soft Computing Prefetcher to Mitigate Cache Degradation by Web Robots. In: Cong, F., Leung, A., Wei, Q. (eds) Advances in Neural Networks - ISNN 2017. ISNN 2017. Lecture Notes in Computer Science(), vol 10261. Springer, Cham. https://doi.org/10.1007/978-3-319-59072-1_63

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59072-1_63

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59071-4

  • Online ISBN: 978-3-319-59072-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics