Skip to main content

Flood Little, Cache More: Effective Result-Reuse in P2P IR Systems

  • Conference paper
  • 985 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4947))

Abstract

State-of-the-art Peer-to-Peer Information Retrieval (P2P IR) systems suffer from their lack of response time guarantee especially with scale. To address this issue, a number of techniques for caching of multi-term inverted list intersections and query results have been proposed recently. Although these enable speedy query evaluations with low network overheads, they fail to consider the potential impact of caching on result quality improvements. In this paper, we propose the use of a cache-aware query routing scheme, that not only reduces the response delays for a query, but also presents an opportunity to improve the result quality while keeping the network usage low. In this regard, we make three-fold contributions in this paper. First of all, we develop a cache-aware, multi-round query routing strategy that balances between query efficiency and result-quality. Next, we propose to aggressively reuse the cached results of even subsets of a query towards an approximate caching technique that can drastically reduce the bandwidth overheads, and study the conditions under which such a scheme can retain good result-quality. Finally, we empirically evaluate these techniques over a fully functional P2P IR system, using a large-scale Wikipedia benchmark, and using both synthetic and real-world query workloads. Our results show that our proposal to combine result caching with multi-round, cache-aware query routing can reduce network traffic by more than half while doubling the result quality.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bender, M., Michel, S., Triantafillou, P., Weikum, G., Zimmer, C.: Minerva: Collaborative P2P Search. In: VLDB (2005)

    Google Scholar 

  2. Podnar, I., Rajman, M., Luu, T., Klemm, F., Aberer, K.: Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys. In: ICDE (2007)

    Google Scholar 

  3. Suel, T., Mathur, C., Wen Wu, J., Zhang, J., Delis, A., Kharrazi, M., Long, X., Shanmugasundaram, K.: In: WebDB (2003)

    Google Scholar 

  4. Li, J., Loo, B.T., Hellerstein, J.M., Kaashoek, M.F., Karger, D.R., Morris, R.: On the Feasibility of Peer-to-Peer Web Indexing and Search. In: Kaashoek, M.F., Stoica, I. (eds.) IPTPS 2003. LNCS, vol. 2735, Springer, Heidelberg (2003)

    Google Scholar 

  5. Zhang, J., Suel, T.: Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment. In: Peer-to-Peer Computing (2005)

    Google Scholar 

  6. Huebsch, R., Hellerstein, J.M., Lanham, N., Loo, B.T., Shenker, S., Stoica, I.: Querying the Internet with PIER. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) VLDB 2003. LNCS, vol. 2944, Springer, Heidelberg (2004)

    Google Scholar 

  7. Luu, T., Klemm, F., Podnar, I., Aberer, M.R.K.: ALVIS Peers: A Scalable Full-text Peer-to-Peer Retrieval Engine. In: P2PIR (2005)

    Google Scholar 

  8. Baeza-Yates, R., Castillo, C., Junqueira, F., Plachouras, V., Silvestri, F.: Challenges in Distributed Information Retrieval. In: ICDE (2007)

    Google Scholar 

  9. Lempel, R., Moran, S.: Competitive Caching of Query Results in Search Engines. Theoretical Computer Science (2004)

    Google Scholar 

  10. Long, X., Suel, T.: Three-level Caching for efficient Query Processing in large Web Search Engines. In: WWW (2005)

    Google Scholar 

  11. Crespo, A., Garcia-Molina, H.: Routing Indices For Peer-to-Peer Systems. In: ICDCS (2002)

    Google Scholar 

  12. Neumann, T., Bender, M., Michel, S., Weikum, G.: A Reproducible Benchmark for P2P Retrieval. In: ExpDB (2006)

    Google Scholar 

  13. Sripanidkulchai, K.: The Popularity of Gnutella Queries and its Implications on Scalability

    Google Scholar 

  14. Wang, C., Xiao, L., Liu, Y., Zheng, P.: Distributed Caching and Adaptive Search in Multilayer P2P Networks. In: ICDCS (2004)

    Google Scholar 

  15. Bhattacharjee, B., Chawathe, S.S., Gopalakrishnan, V., Keleher, P.J., Silaghi, B.D.: Efficient Peer-To-Peer Searches Using Result-Caching. In: Kaashoek, M.F., Stoica, I. (eds.) IPTPS 2003. LNCS, vol. 2735, Springer, Heidelberg (2003)

    Google Scholar 

  16. Skobeltsyn, G., Aberer, K.: Distributed Cache Table: Efficient Query-Driven Processing of Multi-term Queries in P2P Networks. In: P2PIR (2006)

    Google Scholar 

  17. Stoica, I., Morris, R., Karger, D.R., Kaashoek, M.F., Balakrishnan, H.: Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. In: SIGCOMM (2001)

    Google Scholar 

  18. Michel, S., Bender, M., Ntarmos, N., Triantafillou, P., Weikum, G., Zimmer, C.: Discovering and Exploiting Keyword and Attribute-Value Co-occurrences to Improve P2P Routing Indices. In: CIKM (2006)

    Google Scholar 

  19. Callan, J.P., Lu, Z., Croft, W.B.: Searching Distributed Collections with Inference Networks. In: SIGIR (1995)

    Google Scholar 

  20. Nottelmann, H., Fuhr, N.: A Decision-Theoretic Model for Decentralized Query Routing in Hierarchical Peer-to-Peer Networks. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  21. Si, L., Jin, R., Callan, J.P., Ogilvie, P.: A Language Modeling Framework for Resource Selection and Results Merging.. In: CIKM (2002)

    Google Scholar 

  22. Bender, M., Michel, S., Triantafillou, P., Weikum, G., Zimmer, C.: Improving Collection Selection with Overlap-Awareness. In: SIGIR (2005)

    Google Scholar 

  23. Meng, W., Yu, C.T., Liu, K.-L.: Building efficient and effective Metasearch Engines. ACM Comput. Surv. (2002)

    Google Scholar 

  24. Bloom, B.H.: Space/Time Trade-offs in Hash Coding with Allowable Errors. Commun. ACM (1970)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Jayant R. Haritsa Ramamohanarao Kotagiri Vikram Pudi

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zimmer, C., Bedathur, S., Weikum, G. (2008). Flood Little, Cache More: Effective Result-Reuse in P2P IR Systems. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds) Database Systems for Advanced Applications. DASFAA 2008. Lecture Notes in Computer Science, vol 4947. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78568-2_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78568-2_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78567-5

  • Online ISBN: 978-3-540-78568-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics