Abstract
In this invited talk we address the algorithmic problems behind a truly distributed Web search engine. The main goal is to reduce the cost of a Web search engine while keeping all the benefits of a centralized search engine in spite of the intrinsic network latency imposed by Internet. The key ideas to achieve this goal are layered caching, online prediction mechanisms and exploit the locality and distribution of queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Badue, C.S., Baeza-Yates, R., Ribeiro-Neto, B.A., Ziviani, A., Ziviani, N.: Analyzing imbalance among homogeneous index servers in a web search system. Inf. Process. Manage. 43(3), 592–608 (2007)
Badue, C.S., Baeza-Yates, R., Ribeiro-Neto, B.A., Ziviani, A., Ziviani, N.: Modeling performance-driven workload characterization of Web search systems. In: Yu, P.S., Tsotras, V.J., Fox, E.A., Liu, B. (eds.) CIKM, Arlington, Virginia, USA, November 2006, pp. 842–843. ACM, New York (2006)
Baeza-Yates, R., Castillo, C., Junqueira, F., Plachouras, V., Silvestri, F.: Challenges in distributed information retrieval (invited paper). In: ICDE (2007)
Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., Silvestri, F.: Design trade-offs for search engine caching. TWEBÂ 2(4) (2008)
Baeza-Yates, R., Gionis, A., Junqueira, F., Plachouras, V., Telloli, L.: On the feasibility of multi-site Web search engines. In: ACM CIKM 2009, Hong Kong, China, November 2009, pp. 425–434 (2009)
Baeza-Yates, R., Junqueira, F., Plachouras, V., Witschel, H.F.: Admission Policies for Caches of Search Engine Results. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 74–85. Springer, Heidelberg (2007)
Baeza-Yates, R., Murdock, V., Hauff, C.: Efficiency trade-offs in two-tier Web search systems. In: Allan, J., Aslam, J.A., Sanderson, M., Zhai, C., Zobel, J. (eds.) SIGIR, Boston, MA, USA, pp. 163–170. ACM, New York (2009)
Baeza-Yates, R., Ribeiro-Neto, B.A.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston (1999)
Barroso, L.A., Dean, J., Hölzle, U.: Web search for a planet: The google cluster architecture. IEEE Micro 23(2), 22–28 (2003)
Barroso, L.A., Hölzle, U.: The Datacenter as a Computer - an introduction to the design of warehouse-scale machines. Morgan and Claypool Publishers (May 2009)
Bender, M., Michel, S., Triantafillou, P., Weikum, G.: Design Alternatives for Large-Scale Web Search: Alexander was Great, Aeneas a Pioneer, and Anakin has the Force. In: Proceedings of the 1st LSDS-IR Workshop, pp. 16–22 (2007)
Cacheda, F., Carneiro, V., Plachouras, V., Ounis, I.: Performance analysis of distributed information retrieval architectures using an improved network simulation model. Information Processing and Management 43(1), 204–224 (2007)
Cambazoglu, B.B., Plachouras, V., Baeza-Yates, R.: Quantifying performance and quality gains in distributed Web search engines. In: Allan, J., Aslam, J.A., Sanderson, M., Zhai, C., Zobel, J. (eds.) SIGIR, Boston, MA, USA, July 2009, pp. 411–418. ACM, New York (2009)
Exposto, J., Macedo, J., Pina, A., Alves, A., Rufino, J.: Geographical partition for distributed web crawling. In: Proceedings of the 2005 workshop on Geographic information retrieval, pp. 55–60 (2005)
Long, X., Suel, T.: Three-level caching for efficient query processing in large web search engines. In: WWW (2005)
Marin, M., Gil-Costa, V., Bonacic, C., Baeza-Yates, R., Scherson, I.D.: Sync/async parallel search for the efficient design and construction of Web search engines. Parallel Computing (to appear)
Melink, S., Raghavan, S., Yang, B., Garcia-Molina, H.: Building a distributed full-text index for the web. ACM Trans. Inf. Syst. 19(3), 217–241 (2001)
Moffat, A., Webber, W., Zobel, J.: Load balancing for term-distributed parallel retrieval. In: SIGIR, pp. 348–355 (2006)
Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. Inf. Retr. 10(3), 205–231 (2007)
Orlando, S., Perego, R., Silvestri, F.: Design of a Parallel and Distributed WEB Search Engine. In: Proceedings of Parallel Computing (ParCo) 2001 conference, September 2001, pp. 197–204. Imperial College Press, London (2001)
Puppin, D., Perego, R., Silvestri, F., Baeza-Yates, R.: Tuning the capacity of search engines: Load-driven routing and incremental caching to reduce and balance the load. ACM Transactions on Information Systems 28(2) (April 2010)
Shkapenyuk, V., Suel, T.: Design and implementation of a high-performance distributed web crawler. In: ICDE (2002)
Silvestri, F.: Mining query logs: Turning search usage data into knowledge. Foundations and Trends in Information Retrieval 3(4) (2009)
Skobeltsyn, G., Junqueira, F., Plachouras, V., Baeza-Yates, R.: ResIn: A Combination of Result Caching and Index Pruning for High-performance Web Search Engines. In: SIGIR 2008: Proceedings of the 31st International ACM SIGIR conference on Research and Development in Information Retrieval, Singapore (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baeza-Yates, R. (2010). Towards a Distributed Search Engine. In: Calamoneri, T., Diaz, J. (eds) Algorithms and Complexity. CIAC 2010. Lecture Notes in Computer Science, vol 6078. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13073-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-13073-1_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13072-4
Online ISBN: 978-3-642-13073-1
eBook Packages: Computer ScienceComputer Science (R0)