skip to main content
10.1145/2348283.2348369acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Online result cache invalidation for real-time web search

Published:12 August 2012Publication History

ABSTRACT

Caches of results are critical components of modern Web search engines, since they enable lower response time to frequent queries and reduce the load to the search engine backend. Results in long-lived cache entries may become stale, however, as search engines continuously update their index to incorporate changes to the Web. Consequently, it is important to provide mechanisms that control the degree of staleness of cached results, ideally enabling the search engine to always return fresh results. In this paper, we present a new mechanism that identifies and invalidates query results that have become stale in the cache online. The basic idea is to evaluate at query time and against recent changes if cache hits have had their results have changed. For enhancing invalidation efficiency, the generation time of cached queries and their chronological order with respect to the latest index update are used to early prune unaffected queries. We evaluate the proposed approach using documents that change over time and query logs of the Yahoo! search engine. We show that the proposed approach ensures good query results (50% fewer stale results) and high invalidation accuracy (90% fewer unnecessary invalidations) compared to a baseline approach that makes invalidation decisions off-line. More importantly, the proposed approach induces less processing overhead, ensuring an average throughput 73% higher than that of the baseline approach.

References

  1. S. Alici, I. S. Altingovde, R. Ozcan, B. B. Cambazoglu, and O. Ulusoy. Timestamp-based result cache invalidation for web search engines. In Proceedings of the 34th international ACM SIGIR conference on research and development in Information Retrieval, pages 973--982, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. The impact of caching on search engines. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, pages 183--190, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Baeza-Yates, A. Gionis, F. P. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. Design trade-offs for search engine caching. ACM Trans. Web, 2:1--28, October 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Baeza-Yates, F. Junqueira, V. Plachouras, and H. F. Witschel. Admission policies for caches of search engine results. In Proceedings of the 14th international conference on string processing and information retrieval, pages 74--85, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Blanco, E. Bortnikov, F. Junqueira, R. Lempel, L. Telloli, and H. Zaragoza. Caching search engine results over incremental indices. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 82--89, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Bortnikov, R. Lempel, and K. Vornovitsky. Caching for realtime search. In Proceedings of the 33rd European conference on advances in information retrieval, pages 104--116, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenke. Web caching and zipf-like distributions: Evidence and implications. In Proceedings of the 18th Annual Joint Conference of the IEEE Computer and Communications Societies}, pages 126--134, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  8. S. Buttcher, C. L. A. Clarke, and B. Lushman. Hybrid index maintenance for growing text collections. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pages 356--363, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. B. Cambazoglu, F. P. Junqueira, V. Plachouras, S. Banachowski, B. Cui, S. Lim, and B. Bridge. A refreshing perspective of search engine caching. In Proceedings of the 19th international conference on World Wide Web, pages 181--190, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. Cohen and H. Kaplan. Refreshment policies for web content caches. Comput. Netw., 38:795--808, April 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Cutting and J. Pedersen. Optimization for dynamic inverted index maintenance. In Proceedings of the 13th annual international ACM SIGIR conference on research and development in information retrieval, pages 405--411, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Q. Gan and T. Suel. Improved techniques for result caching in web search engines. In Proceedings of the 18th international conference on World Wide Web, pages 431--440, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Lempel and S. Moran. Predictive caching and prefetching of query results in search engines. In Proceedings of the 12th international conference on World Wide Web, pages 19--28, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Lempel and S. Moran. Optimizing result prefetching in web search engines with segmented indices. ACM Trans. Internet Technol., 4:31--59, February 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. Lester, J. Zobel, and H. E. Williams. In-place versus re-build versus re-merge: index maintenance strategies for text retrieval systems. In Proceedings of the 27th Australasian conference on computer science, volume 26, pages 15--23, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Li, W.-C. Lee, A. Sivasubramaniam, and C. L. Giles. A hybrid cache and prefetch mechanism for scientific literature search engines. In Proceedings of the 7th international conference on Web engineering, pages 121--136, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. X. Long and T. Suel. Three-level caching for efficient query processing in large web search engines. In Proceedings of the 14th international conference on World Wide Web, pages 257--266, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Markatos. On caching search engine query results. Computer Communications, 24(2):137--143, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. C. Saraiva, E. Silva de Moura, N. Ziviani, W. Meira, R. Fonseca, and B. Riberio-Neto. Rank-preserving two-level caching for scalable search engines. In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, pages 51--58, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Tsegay, A. Turpin, and J. Zobel. Dynamic index pruning for effective caching. In Proceedings of the sixteenth ACM conference on information and knowledge management, pages 987--990, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Online result cache invalidation for real-time web search

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
      August 2012
      1236 pages
      ISBN:9781450314725
      DOI:10.1145/2348283

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 August 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader