skip to main content
10.1145/2505515.2507870acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Exploiting query term correlation for list caching in web search engines

Published:27 October 2013Publication History

ABSTRACT

Caching technologies have been widely employed to boost the performance of Web search engines. Motivated by the correlation between terms in query logs from a commercial search engine, we explore the idea of a caching scheme based on pairs of terms, rather than individual terms (which is the typical approach used by search engines today). We propose an inverted list caching policy, based on the Least Recently Used method, in which the co-occurring correlation between terms in the query stream is accounted for when deciding on which terms to keep in the cache. We consider not only the term co-occurrence within the same query but also the co-occurrence between separate queries. Experimental results show that the proposed approach can improve not only the cache hit ratio but also the overall throughput of the system when compared to existing list caching algorithms.

References

  1. R. A. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. The impact of caching on search engines. In SIGIR, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. A. Baeza-Yates and S. Jonassen. Modeling static caching in web search engines. In ECIR, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. A. Baeza-Yates, F. Junqueira, V. Plachouras, and H. F. Witschel. Admission policies for caches of search engine results. In SPIRE, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. A. Baeza-Yates and F. Saint-Jean. A three level search engine index based in query log distribution. In SPIRE, 2003.Google ScholarGoogle Scholar
  5. M. Chau, Y. Lu, X. Fang, and C. C. Yang. Characteristics of character usage in chinese web searching. Inf. Process. Manage., 45(1):115--130, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Chaudhuri, K. W. Church, A. C. König, and L. Sui. Heavy-tailed distributions and multi-keyword queries. In SIGIR, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. D. Demaine, A. López-Ortiz, and J. I. Munro. Experiments on adaptive set intersections for text retrieval systems. In ALENEX, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Q. Gan and T. Suel. Improved techniques for result caching in web search engines. In WWW, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. X. Long and T. Suel. Three-level caching for efficient query processing in large web search engines. In WWW, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. P. Markatos. On caching search engine query results. Computer Communications, 24(2):137--143, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Ozcan, I. S. Altingovde, B. B. Cambazoglu, F. P. Junqueira, and Ö. Ulusoy. A five-level static cache architecture for web search engine. Inf. Process. Manage., 48(5):828--840, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. C. Saraiva, E. S. de Moura, R. C. Fonseca, W. M. Jr., B. A. Ribeiro-Neto, and N. Ziviani. Rank-preserving two-level caching for scalable search engines. In SIGIR, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Silverstein, M. R. Henzinger, H. Marais, and M. Moricz. Analysis of a very large web search engine query log. SIGIR Forum, 33(1):6--12, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Zhang, X. Long, and T. Suel. Performance of compressed inverted list caching in search engines. In WWW, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploiting query term correlation for list caching in web search engines

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
      October 2013
      2612 pages
      ISBN:9781450322638
      DOI:10.1145/2505515

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 October 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      CIKM '13 Paper Acceptance Rate143of848submissions,17%Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader