ABSTRACT
Caching technologies have been widely employed to boost the performance of Web search engines. Motivated by the correlation between terms in query logs from a commercial search engine, we explore the idea of a caching scheme based on pairs of terms, rather than individual terms (which is the typical approach used by search engines today). We propose an inverted list caching policy, based on the Least Recently Used method, in which the co-occurring correlation between terms in the query stream is accounted for when deciding on which terms to keep in the cache. We consider not only the term co-occurrence within the same query but also the co-occurrence between separate queries. Experimental results show that the proposed approach can improve not only the cache hit ratio but also the overall throughput of the system when compared to existing list caching algorithms.
- R. A. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. The impact of caching on search engines. In SIGIR, 2007. Google ScholarDigital Library
- R. A. Baeza-Yates and S. Jonassen. Modeling static caching in web search engines. In ECIR, 2012. Google ScholarDigital Library
- R. A. Baeza-Yates, F. Junqueira, V. Plachouras, and H. F. Witschel. Admission policies for caches of search engine results. In SPIRE, 2007. Google ScholarDigital Library
- R. A. Baeza-Yates and F. Saint-Jean. A three level search engine index based in query log distribution. In SPIRE, 2003.Google Scholar
- M. Chau, Y. Lu, X. Fang, and C. C. Yang. Characteristics of character usage in chinese web searching. Inf. Process. Manage., 45(1):115--130, 2009. Google ScholarDigital Library
- S. Chaudhuri, K. W. Church, A. C. König, and L. Sui. Heavy-tailed distributions and multi-keyword queries. In SIGIR, 2007. Google ScholarDigital Library
- E. D. Demaine, A. López-Ortiz, and J. I. Munro. Experiments on adaptive set intersections for text retrieval systems. In ALENEX, 2001. Google ScholarDigital Library
- Q. Gan and T. Suel. Improved techniques for result caching in web search engines. In WWW, 2009. Google ScholarDigital Library
- X. Long and T. Suel. Three-level caching for efficient query processing in large web search engines. In WWW, 2005. Google ScholarDigital Library
- E. P. Markatos. On caching search engine query results. Computer Communications, 24(2):137--143, 2001. Google ScholarDigital Library
- R. Ozcan, I. S. Altingovde, B. B. Cambazoglu, F. P. Junqueira, and Ö. Ulusoy. A five-level static cache architecture for web search engine. Inf. Process. Manage., 48(5):828--840, 2012. Google ScholarDigital Library
- P. C. Saraiva, E. S. de Moura, R. C. Fonseca, W. M. Jr., B. A. Ribeiro-Neto, and N. Ziviani. Rank-preserving two-level caching for scalable search engines. In SIGIR, 2001. Google ScholarDigital Library
- C. Silverstein, M. R. Henzinger, H. Marais, and M. Moricz. Analysis of a very large web search engine query log. SIGIR Forum, 33(1):6--12, 1999. Google ScholarDigital Library
- J. Zhang, X. Long, and T. Suel. Performance of compressed inverted list caching in search engines. In WWW, 2008. Google ScholarDigital Library
Index Terms
- Exploiting query term correlation for list caching in web search engines
Recommendations
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementThis work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Adaptive time-to-live strategies for query result caching in web search engines
ECIR'12: Proceedings of the 34th European conference on Advances in Information RetrievalAn important research problem that has recently started to receive attention is the freshness issue in search engine result caches. In the current techniques in literature, the cached search result pages are associated with a fixed time-to-live (TTL) ...
Query routing for Web search engines: architecture and experiments
AbstractGeneral-purpose search engines such as AltaVista and Lycos are notorious for returning irrelevant results in response to user queries. Consequently, thousands of specialized, topic-specific search engines (from VacationSpot.com to ...
Comments