Abstract
Modern information retrieval systems use several levels of caching to speedup computation by exploiting frequent, recent or costly data used in the past. In this study we propose and evaluate a static cache that works simultaneously as list and intersection cache, offering a more efficient way of handling cache space. In addition, we propose effective strategies to select the term pairs that should populate the cache. Simulation using two datasets and a real query log reveal that the proposed approach improves overall performance in terms of total processing time, achieving savings of up to 40% in the best case.
This work was partially supported by EU-IRSES project EUSACOU 247574, by EU FET project MULTIPLEX 317532 and by UBACyT Project 20020120100058 “Herramientas algorítmicas avanzadas para aplicaciones de búsqueda en Internet - Parte 2”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., Silvestri, F.: The impact of caching on search engines. In: Proc. of the 30th Annual Int. Conf. on Research and Development in Information Retrieval (2007)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval: The Concepts and Technology behind Search, 2nd edn. Addison-Wesley Prof., Inc. (2011)
Cambazoglu, B.B., Zaragoza, H., Chapelle, O., Chen, J., Liao, C., Zheng, Z., Degenhardt, J.: Early exit optimizations for additive machine learned ranking systems. In: Proc. of the Third ACM Int. Conf. on Web Search and Data Mining (2010)
Culpepper, J.S., Moffat, A.: Compact set representation for information retrieval. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 137–148. Springer, Heidelberg (2007)
Dean, J.: Challenges in building large-scale information retrieval systems: Invited talk. In: Proc. of the Second ACM International Conf. on Web Search and Data Mining, WSDM 2009, p. 1. ACM, New York (2009)
Ding, S., Attenberg, J., Baeza-Yates, R., Suel, T.: Batch query processing for web search engines. In: Proc. of the Fourth ACM International Conf. on Web Search and Data Mining, WSDM 2011, New York, NY, USA, pp. 137–146 (2011)
Fagni, T., Perego, R., Silvestri, F., Orlando, S.: Boosting the performance of web search engines: Caching and prefetching query results by exploiting historicalusage data. ACM Trans. Inf. Syst. 24(1), 51–78 (2006)
Feuerstein, E., Tolosa, G.: Analysis of cost-aware policies for intersection caching in search nodes. In: Proc. of the XXXII Conf. of the Chilean Society of Computer Science, SCCC 2013 (2013)
Feuerstein, E., Tolosa, G.: Cost-aware intersection caching and processing strategies for in-memory inverted indexes. In: Proc. of 11th Workshop on Large-scale and Distributed Systems for Information Retrieval, LSDS-IR 2014, New York (2014)
Gan, Q., Suel, T.: Improved techniques for result caching in web search engines. In: Proc. of the 18th Int. Conf. on World Wide Web, WWW 2009, pp. 431–440 (2009)
Hirai, J., Raghavan, S., Garcia-Molina, H., Paepcke, A.: Webbase: A repository of web pages. In: Proc. of the 9th International World Wide Web Conf. on Computer Networks. North-Holland Publishing Co. (2000)
Lam, H.T., Perego, R., Quan, N.T.M., Silvestri, F.: Entry pairing in inverted file. In: Vossen, G., Long, D.D.E., Yu, J.X. (eds.) WISE 2009. LNCS, vol. 5802, pp. 511–522. Springer, Heidelberg (2009)
Long, X., Suel, T.: Three-level caching for efficient query processing in large web search engines. In: Proc. of the 14th Int. Conf. on World Wide Web, WWW 2005, USA, pp. 257–266 (2005)
Markatos, E.: On caching search engine query results. Comput. Commun. 24(2), 137–143 (2001)
Ozcan, R., Altingovde, I.S., Ulusoy, O.: Cost-aware strategies for query result caching in web search engines. ACM Trans. Web 5(2), 9:1–9:25 (2011)
Ozcan, R., Sengor Altingovde, I., Barla Cambazoglu, B., Junqueira, F.P., Ulusoy, O.: A five-level static cache architecture for web search engines. Information Processing & Management 48(5), 828–840 (2012)
Pass, G., Chowdhury, A., Torgeson, C.: A picture of search. In: Proc. of the 1st International Conf. on Scalable Information Systems, InfoScale 2006. ACM (2006)
Saraiva, P.C., Silva de Moura, E., Ziviani, N., Meira, W., Fonseca, R., Riberio-Neto, B.: Rank-preserving two-level caching for scalable search engines. In: Proc. of the 24th Annual Int. Conf. on Research and Development in Information Retrieval, SIGIR 2001, USA, pp. 51–58 (2001)
Turtle, H., Flood, J.: Query evaluation: Strategies and optimizations. Information Processing and Management 31(6), 831–850 (1995)
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Zhang, J., Long, X., Suel, T.: Performance of compressed inverted list caching in search engines. In: Proc. of the 17th Int. Conf. on World Wide Web, WWW 2008, USA, pp. 387–396 (2008)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (July 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Tolosa, G., Becchetti, L., Feuerstein, E., Marchetti-Spaccamela, A. (2014). Performance Improvements for Search Systems Using an Integrated Cache of Lists+Intersections. In: Moura, E., Crochemore, M. (eds) String Processing and Information Retrieval. SPIRE 2014. Lecture Notes in Computer Science, vol 8799. Springer, Cham. https://doi.org/10.1007/978-3-319-11918-2_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-11918-2_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11917-5
Online ISBN: 978-3-319-11918-2
eBook Packages: Computer ScienceComputer Science (R0)