ABSTRACT
Caches of results are critical components of modern Web search engines, since they enable lower response time to frequent queries and reduce the load to the search engine backend. Results in long-lived cache entries may become stale, however, as search engines continuously update their index to incorporate changes to the Web. Consequently, it is important to provide mechanisms that control the degree of staleness of cached results, ideally enabling the search engine to always return fresh results. In this paper, we present a new mechanism that identifies and invalidates query results that have become stale in the cache online. The basic idea is to evaluate at query time and against recent changes if cache hits have had their results have changed. For enhancing invalidation efficiency, the generation time of cached queries and their chronological order with respect to the latest index update are used to early prune unaffected queries. We evaluate the proposed approach using documents that change over time and query logs of the Yahoo! search engine. We show that the proposed approach ensures good query results (50% fewer stale results) and high invalidation accuracy (90% fewer unnecessary invalidations) compared to a baseline approach that makes invalidation decisions off-line. More importantly, the proposed approach induces less processing overhead, ensuring an average throughput 73% higher than that of the baseline approach.
- S. Alici, I. S. Altingovde, R. Ozcan, B. B. Cambazoglu, and O. Ulusoy. Timestamp-based result cache invalidation for web search engines. In Proceedings of the 34th international ACM SIGIR conference on research and development in Information Retrieval, pages 973--982, 2011. Google ScholarDigital Library
- R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. The impact of caching on search engines. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, pages 183--190, 2007. Google ScholarDigital Library
- R. Baeza-Yates, A. Gionis, F. P. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. Design trade-offs for search engine caching. ACM Trans. Web, 2:1--28, October 2008. Google ScholarDigital Library
- R. Baeza-Yates, F. Junqueira, V. Plachouras, and H. F. Witschel. Admission policies for caches of search engine results. In Proceedings of the 14th international conference on string processing and information retrieval, pages 74--85, 2007. Google ScholarDigital Library
- R. Blanco, E. Bortnikov, F. Junqueira, R. Lempel, L. Telloli, and H. Zaragoza. Caching search engine results over incremental indices. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 82--89, 2010. Google ScholarDigital Library
- E. Bortnikov, R. Lempel, and K. Vornovitsky. Caching for realtime search. In Proceedings of the 33rd European conference on advances in information retrieval, pages 104--116, 2011. Google ScholarDigital Library
- L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenke. Web caching and zipf-like distributions: Evidence and implications. In Proceedings of the 18th Annual Joint Conference of the IEEE Computer and Communications Societies}, pages 126--134, 1999.Google ScholarCross Ref
- S. Buttcher, C. L. A. Clarke, and B. Lushman. Hybrid index maintenance for growing text collections. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pages 356--363, 2006. Google ScholarDigital Library
- B. B. Cambazoglu, F. P. Junqueira, V. Plachouras, S. Banachowski, B. Cui, S. Lim, and B. Bridge. A refreshing perspective of search engine caching. In Proceedings of the 19th international conference on World Wide Web, pages 181--190, 2010. Google ScholarDigital Library
- E. Cohen and H. Kaplan. Refreshment policies for web content caches. Comput. Netw., 38:795--808, April 2002. Google ScholarDigital Library
- D. Cutting and J. Pedersen. Optimization for dynamic inverted index maintenance. In Proceedings of the 13th annual international ACM SIGIR conference on research and development in information retrieval, pages 405--411, 1990. Google ScholarDigital Library
- Q. Gan and T. Suel. Improved techniques for result caching in web search engines. In Proceedings of the 18th international conference on World Wide Web, pages 431--440, 2009. Google ScholarDigital Library
- R. Lempel and S. Moran. Predictive caching and prefetching of query results in search engines. In Proceedings of the 12th international conference on World Wide Web, pages 19--28, 2003. Google ScholarDigital Library
- R. Lempel and S. Moran. Optimizing result prefetching in web search engines with segmented indices. ACM Trans. Internet Technol., 4:31--59, February 2004. Google ScholarDigital Library
- N. Lester, J. Zobel, and H. E. Williams. In-place versus re-build versus re-merge: index maintenance strategies for text retrieval systems. In Proceedings of the 27th Australasian conference on computer science, volume 26, pages 15--23, 2004. Google ScholarDigital Library
- H. Li, W.-C. Lee, A. Sivasubramaniam, and C. L. Giles. A hybrid cache and prefetch mechanism for scientific literature search engines. In Proceedings of the 7th international conference on Web engineering, pages 121--136, 2007. Google ScholarDigital Library
- X. Long and T. Suel. Three-level caching for efficient query processing in large web search engines. In Proceedings of the 14th international conference on World Wide Web, pages 257--266, 2005. Google ScholarDigital Library
- E. Markatos. On caching search engine query results. Computer Communications, 24(2):137--143, 2001. Google ScholarDigital Library
- P. C. Saraiva, E. Silva de Moura, N. Ziviani, W. Meira, R. Fonseca, and B. Riberio-Neto. Rank-preserving two-level caching for scalable search engines. In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, pages 51--58, 2001. Google ScholarDigital Library
- Y. Tsegay, A. Turpin, and J. Zobel. Dynamic index pruning for effective caching. In Proceedings of the sixteenth ACM conference on information and knowledge management, pages 987--990, 2007. Google ScholarDigital Library
Index Terms
- Online result cache invalidation for real-time web search
Recommendations
Timestamp-based result cache invalidation for web search engines
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalThe result cache is a vital component for efficiency of large-scale web search engines, and maintaining the freshness of cached query results is the current research challenge. As a remedy to this problem, our work proposes a new mechanism to identify ...
Timestamp-based cache invalidation for search engines
WWW '11: Proceedings of the 20th international conference companion on World wide webWe propose a new mechanism to predict stale queries in the result cache of a search engine. The novelty of our approach is in the use of timestamps in staleness predictions. We show that our approach incurs very little overhead on the system while its ...
Energy-Efficient Mobile Cache Invalidation
Caching data in a wireless mobile computer can significantly reduce the bandwidth requirement. However, due to battery power limitation, a wireless mobile computer may often be forced to operate in a doze or even totally disconnected mode. As a result, the ...
Comments