Abstract
This paper investigates the feasibility of a resource prefetcher able to predict future requests made by web robots, which are software programs rapidly overtaking human users as the dominant source of web server traffic. Such a prefetcher is a crucial first line of defense for web caches and content management systems that must service many requests while maintaining good performance. Our prefetcher marries a deep recurrent neural network with a Bayesian network to combine prior global data with local data about specific robots. Experiments with traffic logs from web servers across two universities demonstrate improved predictions over a traditional dependency graph approach. Finally, preliminary evaluation of a hypothetical caching system that incorporates our prefetching scheme is discussed.
N. Xie and K. Brown are joint first authors of this paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The RNN is labeled LSTM in the figures.
References
Almeida, V., Menascé, D., Riedi, R., Peligrinelli, F., Fonseca, R., Meira Jr., W.: Analyzing web robots and their impact on caching. In: Proceedings of Sixth Workshop on Web Caching and Content Distribution, pp. 20–22 (2001)
Brandman, O., Cho, J., Garcia-Molina, H., Shivakumar, S.: Crawler-friendly web servers. In: Proceedings of Performance and Architecture of Web Servers Conference (2000)
Chen, X., Zhang, X.: A popularity-based prediction model for web prefetching. Computer 36(3), 63–70 (2003)
Dietz, L.: Directed factor graph notation for generative models. Technical report, Max Planck Institute for Informatics (2010)
Doran, D., Gokhale, S.: A classification framework for web robots. J. Am. Soc. Inf. Sci. Technol. 63, 2549–2554 (2012)
Doran, D., Gokhale, S.S.: Web robot detection techniques: overview and limitations. Data Mining Knowl. Discov. 22(1–2), 183–210 (2011)
Doran, D., Morillo, K., Gokhale, S.: A comparison of web robot and human requests. In: Proceedings of ACM/IEEE Conference on Advances in Social Network Analysis and Mining, pp. 1374–1380 (2013)
Gellert, A., Florea, A.: Web prefetching through efficient prediction by partial matching. World Wide Web 19(5), 921–932 (2016)
Graves, A.: Neural networks. In: Graves, A. (ed.) Supervised Sequence Labelling with Recurrent Neural Networks, pp. 15–35. Springer, Heidelberg (2012)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Lee, J., Cha, S., Lee, D., Lee, H.: Classification of web robots: an empirical study based on over one billion requests. Comput. Secur. 28(8), 795–802 (2009)
Li, H., Lee, W.-C., Sivasubramaniam, A., Giles, C.L.: A hybrid cache and prefetch mechanism for scientific literature search engines. In: Baresi, L., Fraternali, P., Houben, G.-J. (eds.) ICWE 2007. LNCS, vol. 4607, pp. 121–136. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73597-7_10
Menascé, D., Almeida, V., Riedi, R., Ribeiro, F., Fonseca, R., Meira Jr., W.: In search of invariants for e-business workloads. In: Proceedings of the 2nd ACM Conference on Electronic Commerce, pp. 56–65 (2000)
Pallis, G., Vakali, A., Pokorny, J.: A clustering-based prefetching scheme on a web cache environment. Comput. Electr. Eng. 34(4), 309–323 (2008)
Qualman, E.: Socialnomics: How Social Media Transforms the Way We Live and Do Business. Wiley, Hoboken (2012)
Rude, H.N., Doran, D.: Request type prediction for web robot and internet of things traffic. In: Proceedings of IEEE International Conference on Machine Learning and Applications, pp. 995–1000 (2015)
Zeifman, I.: Report: Bot traffic is up to 61.5% of all website traffic. bit.ly/MoMRxE
Acknowledgment
The authors thank Logan Rickert for data processing support, Maria-Carla Calzarossa for data from the University of Pavia, and Mark Anderson for data from Wright State University. This paper is based on work supported by the National Science Foundation (NSF) under Grant No. 1464104. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Xie, N., Brown, K., Rude, N., Doran, D. (2017). A Soft Computing Prefetcher to Mitigate Cache Degradation by Web Robots. In: Cong, F., Leung, A., Wei, Q. (eds) Advances in Neural Networks - ISNN 2017. ISNN 2017. Lecture Notes in Computer Science(), vol 10261. Springer, Cham. https://doi.org/10.1007/978-3-319-59072-1_63
Download citation
DOI: https://doi.org/10.1007/978-3-319-59072-1_63
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59071-4
Online ISBN: 978-3-319-59072-1
eBook Packages: Computer ScienceComputer Science (R0)