Abstract
This paper consider a Markovian model for the optimal dynamic scheduling of page refreshes in a local repository of copies of randomly evolving remote web pages. A limited number of refresh agents, e.g., crawlers for web search engines, are used to visit the remote pages for refreshing their copies, which raises the need for effective scheduling policies. Maintaining the copies results in utilities and costs, which are incorporated into a performance objective to be optimized. The paper develops a low-complexity closed-form heuristic dynamic index policy, and an upper bound on the optimal performance, by adapting a general approach of Whittle. The existence and evaluation of the index are resolved by methods introduced earlier by the author. A numerical study provides evidence showing that the proposed policy is consistently near optimal and may substantially outperform a myopic baseline policy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wolf, J.L., Squillante, M.S., Yu, P.S., Sethuraman, J., Ozsen, L.: Optimal crawling strategies for web search engines. In: Proc. 11th Int. Conf. World Wide Web, WWW 2002, pp. 136–147. ACM, New York (2002)
Cho, J., García-Molina, H.: Effective page refresh policies for Web crawlers. ACM Trans. Database Syst. 28, 390–426 (2003)
Ling, Y., Mi, J.: An optimal trade-off between content freshness and refresh cost. J. Appl. Probab. 41, 721–734 (2004)
Lewandowski, D.: A three-year study on the freshness of web search engine databases. J. Information Sci. 34, 817–831 (2008)
Olston, C., Najork, M.: Web crawling. Found. Trends Info. Retrieval 4, 175–246 (2010)
Raiss-El-Fenni, M., El-Azouzi, R., Menasché, D., Xu, Y.: Optimal sensing policies for smartphones in hybrid networks: A POMDP approach. In: Proc. 6th Int. Conf. Performance Eval. Method. Tools (VALUETOOLS 2012), pp. 89–98. ICST (2012)
Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of optimal queuing network control. Math. Oper. Res. 24, 293–305 (1999)
Whittle, P.: Restless bandits: Activity allocation in a changing world. In: Gani, J. (ed.) A Celebration of Applied Probability, UK. J. Appl. Probab. Trust, Sheffield, vol. 25, pp. 287–298 (1988)
Niño-Mora, J.: Restless bandits, partial conservation laws and indexability. Adv. Appl. Probab. 33, 76–98 (2001)
Niño-Mora, J.: Dynamic allocation indices for restless projects and queueing admission control: A polyhedral approach. Math. Program. 93, 361–413 (2002)
Niño-Mora, J.: Restless bandit marginal productivity indices, diminishing returns and optimal control of make-to-order/make-to-stock M/G/1 queues. Math. Oper. Res. 31, 50–84 (2006)
Niño-Mora, J.: Dynamic priority allocation via restless bandit marginal productivity indices. Top 15, 161–198 (2007)
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Nashua (1999)
Weber, R.R., Weiss, G.: On an index policy for restless bandits. J. Appl. Probab. 27, 637–648 (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Niño-Mora, J. (2014). A Dynamic Page-Refresh Index Policy for Web Crawlers. In: Sericola, B., Telek, M., Horváth, G. (eds) Analytical and Stochastic Modeling Techniques and Applications. ASMTA 2014. Lecture Notes in Computer Science, vol 8499. Springer, Cham. https://doi.org/10.1007/978-3-319-08219-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-08219-6_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08218-9
Online ISBN: 978-3-319-08219-6
eBook Packages: Computer ScienceComputer Science (R0)