Skip to main content

Learning to Distribute Queries into Web Search Nodes

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5993))

Abstract

Web search engines are composed of a large set of search nodes and a broker machine that feeds them with queries. A location cache keeps minimal information in the broker to register the search nodes capable of producing the top-N results for frequent queries. In this paper we show that it is possible to use the location cache as a training dataset for a standard machine learning algorithm and build a predictive model of the search nodes expected to produce the best approximated results for queries. This can be used to prevent the broker from sending queries to all search nodes under situations of sudden peaks in query traffic and, as a result, avoid search node saturation. This paper proposes a logistic regression model to quickly predict the most pertinent search nodes for a given query.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amiri, K., Park, S., Tewari, R., Padmanabhan, S.: Scalable template-based query containment checking for web semantic caches. In: ICDE (2003)

    Google Scholar 

  2. Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., Silvestri, F.: Design trade-offs for search engine caching. ACM TWEB 2(4) (2008)

    Google Scholar 

  3. Chidlovskii, B., Roncancio, C., Schneider, M.: Semantic Cache Mechanism for Heterogeneous Web Querying. Computer Networks 31(11-16), 1347–1360 (1999)

    Article  Google Scholar 

  4. Chidlovskii, B., Borghoff, U.: Semantic Caching of Web Queries. VLDB Journal 9(1), 2–17 (2000)

    Article  Google Scholar 

  5. Dhillon, I., Mallela, S., Modha, D.: Information-theoretic co-clustering. In: KDD (2003)

    Google Scholar 

  6. Fagni, T., Perego, R., Silvestri, F., Orlando, S.: Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data. ACM TOIS 24(1), 51–78 (2006)

    Article  Google Scholar 

  7. Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F.: A Metric Cache for Similarity Search. In: LSDS-IR (2008)

    Google Scholar 

  8. Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)

    Google Scholar 

  9. Ferrarotti, F., Marin, M., Mendoza, M.: A Last-Resort Semantic Cache for Web Queries. In: SPIRE (2009)

    Google Scholar 

  10. Gan, Q., Suel, T.: Improved Techniques for Result Caching in Web Search Engines. In: WWW (2009)

    Google Scholar 

  11. Godfrey, P., Gryz, J.: Answering Queries by Semantic Caches. In: Bench-Capon, T.J.M., Soda, G., Tjoa, A.M. (eds.) DEXA 1999. LNCS, vol. 1677, pp. 485–498. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  12. Keerthi, S., Sundararajan, S., Chang, K., Hsieh, C., Lin, C.: A sequential dual method for large scale multi-class linear SVMs. In: SIGKDD (2008)

    Google Scholar 

  13. Lempel, R., Moran, S.: Predictive caching and prefetching of query results in search engines. In: WWW (2003)

    Google Scholar 

  14. Lin, C., Weng, R., Keerthi, S.: Trust region Newton method for large-scale logistic regression. Journal of Machine Learning Research 9, 627–650 (2008)

    MathSciNet  Google Scholar 

  15. Long, X., Suel, T.: Three-level caching for efficient query processing in large Web search engines. In: WWW (2005)

    Google Scholar 

  16. Marin, M., Ferrarotti, F., Mendoza, M., Gomez, C., Gil-Costa, V.: Location Cache for Web Queries. In: CIKM (2009)

    Google Scholar 

  17. Markatos, E.: On caching search engine query results. Computer Communications 24(7), 137–143 (2000)

    Google Scholar 

  18. Puppin, D., Silvestri, F.: C++ implementation of the co-cluster algorithm by Dhillon, Mallela, and Modha, http://hpc.isti.cnr.it

  19. Puppin, D., Silvestri, F., Perego, R., Baeza-Yates, R.: Load-balancing and caching for collection selection architectures. In: INFOSCALE (2007)

    Google Scholar 

  20. Tsoumakas, G., Katakis, I.: Multi-label Classification: An Overview. International Journal of Data Warehousing and Mining 3(3), 1–13 (2007)

    Google Scholar 

  21. Yahoo! Search BOSS API (2009), http://developer.yahoo.com/search/boss/

  22. Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: WWW (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mendoza, M., Marín, M., Ferrarotti, F., Poblete, B. (2010). Learning to Distribute Queries into Web Search Nodes. In: Gurrin, C., et al. Advances in Information Retrieval. ECIR 2010. Lecture Notes in Computer Science, vol 5993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12275-0_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12275-0_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12274-3

  • Online ISBN: 978-3-642-12275-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics