Skip to main content

Towards a Distributed Search Engine

  • Conference paper
Algorithms and Complexity (CIAC 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6078))

Included in the following conference series:

  • 967 Accesses

Abstract

In this invited talk we address the algorithmic problems behind a truly distributed Web search engine. The main goal is to reduce the cost of a Web search engine while keeping all the benefits of a centralized search engine in spite of the intrinsic network latency imposed by Internet. The key ideas to achieve this goal are layered caching, online prediction mechanisms and exploit the locality and distribution of queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Badue, C.S., Baeza-Yates, R., Ribeiro-Neto, B.A., Ziviani, A., Ziviani, N.: Analyzing imbalance among homogeneous index servers in a web search system. Inf. Process. Manage. 43(3), 592–608 (2007)

    Article  Google Scholar 

  2. Badue, C.S., Baeza-Yates, R., Ribeiro-Neto, B.A., Ziviani, A., Ziviani, N.: Modeling performance-driven workload characterization of Web search systems. In: Yu, P.S., Tsotras, V.J., Fox, E.A., Liu, B. (eds.) CIKM, Arlington, Virginia, USA, November 2006, pp. 842–843. ACM, New York (2006)

    Google Scholar 

  3. Baeza-Yates, R., Castillo, C., Junqueira, F., Plachouras, V., Silvestri, F.: Challenges in distributed information retrieval (invited paper). In: ICDE (2007)

    Google Scholar 

  4. Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., Silvestri, F.: Design trade-offs for search engine caching. TWEB 2(4) (2008)

    Google Scholar 

  5. Baeza-Yates, R., Gionis, A., Junqueira, F., Plachouras, V., Telloli, L.: On the feasibility of multi-site Web search engines. In: ACM CIKM 2009, Hong Kong, China, November 2009, pp. 425–434 (2009)

    Google Scholar 

  6. Baeza-Yates, R., Junqueira, F., Plachouras, V., Witschel, H.F.: Admission Policies for Caches of Search Engine Results. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 74–85. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  7. Baeza-Yates, R., Murdock, V., Hauff, C.: Efficiency trade-offs in two-tier Web search systems. In: Allan, J., Aslam, J.A., Sanderson, M., Zhai, C., Zobel, J. (eds.) SIGIR, Boston, MA, USA, pp. 163–170. ACM, New York (2009)

    Google Scholar 

  8. Baeza-Yates, R., Ribeiro-Neto, B.A.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston (1999)

    Google Scholar 

  9. Barroso, L.A., Dean, J., Hölzle, U.: Web search for a planet: The google cluster architecture. IEEE Micro 23(2), 22–28 (2003)

    Article  Google Scholar 

  10. Barroso, L.A., Hölzle, U.: The Datacenter as a Computer - an introduction to the design of warehouse-scale machines. Morgan and Claypool Publishers (May 2009)

    Google Scholar 

  11. Bender, M., Michel, S., Triantafillou, P., Weikum, G.: Design Alternatives for Large-Scale Web Search: Alexander was Great, Aeneas a Pioneer, and Anakin has the Force. In: Proceedings of the 1st LSDS-IR Workshop, pp. 16–22 (2007)

    Google Scholar 

  12. Cacheda, F., Carneiro, V., Plachouras, V., Ounis, I.: Performance analysis of distributed information retrieval architectures using an improved network simulation model. Information Processing and Management 43(1), 204–224 (2007)

    Article  Google Scholar 

  13. Cambazoglu, B.B., Plachouras, V., Baeza-Yates, R.: Quantifying performance and quality gains in distributed Web search engines. In: Allan, J., Aslam, J.A., Sanderson, M., Zhai, C., Zobel, J. (eds.) SIGIR, Boston, MA, USA, July 2009, pp. 411–418. ACM, New York (2009)

    Google Scholar 

  14. Exposto, J., Macedo, J., Pina, A., Alves, A., Rufino, J.: Geographical partition for distributed web crawling. In: Proceedings of the 2005 workshop on Geographic information retrieval, pp. 55–60 (2005)

    Google Scholar 

  15. Long, X., Suel, T.: Three-level caching for efficient query processing in large web search engines. In: WWW (2005)

    Google Scholar 

  16. Marin, M., Gil-Costa, V., Bonacic, C., Baeza-Yates, R., Scherson, I.D.: Sync/async parallel search for the efficient design and construction of Web search engines. Parallel Computing (to appear)

    Google Scholar 

  17. Melink, S., Raghavan, S., Yang, B., Garcia-Molina, H.: Building a distributed full-text index for the web. ACM Trans. Inf. Syst. 19(3), 217–241 (2001)

    Article  Google Scholar 

  18. Moffat, A., Webber, W., Zobel, J.: Load balancing for term-distributed parallel retrieval. In: SIGIR, pp. 348–355 (2006)

    Google Scholar 

  19. Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. Inf. Retr. 10(3), 205–231 (2007)

    Article  Google Scholar 

  20. Orlando, S., Perego, R., Silvestri, F.: Design of a Parallel and Distributed WEB Search Engine. In: Proceedings of Parallel Computing (ParCo) 2001 conference, September 2001, pp. 197–204. Imperial College Press, London (2001)

    Google Scholar 

  21. Puppin, D., Perego, R., Silvestri, F., Baeza-Yates, R.: Tuning the capacity of search engines: Load-driven routing and incremental caching to reduce and balance the load. ACM Transactions on Information Systems 28(2) (April 2010)

    Google Scholar 

  22. Shkapenyuk, V., Suel, T.: Design and implementation of a high-performance distributed web crawler. In: ICDE (2002)

    Google Scholar 

  23. Silvestri, F.: Mining query logs: Turning search usage data into knowledge. Foundations and Trends in Information Retrieval 3(4) (2009)

    Google Scholar 

  24. Skobeltsyn, G., Junqueira, F., Plachouras, V., Baeza-Yates, R.: ResIn: A Combination of Result Caching and Index Pruning for High-performance Web Search Engines. In: SIGIR 2008: Proceedings of the 31st International ACM SIGIR conference on Research and Development in Information Retrieval, Singapore (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Baeza-Yates, R. (2010). Towards a Distributed Search Engine. In: Calamoneri, T., Diaz, J. (eds) Algorithms and Complexity. CIAC 2010. Lecture Notes in Computer Science, vol 6078. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13073-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13073-1_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13072-4

  • Online ISBN: 978-3-642-13073-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics