Skip to main content

Text-Based Content Search and Retrieval in Ad-hoc P2P Communities

  • Conference paper
  • First Online:
Book cover Web Engineering and Peer-to-Peer Computing (NETWORKING 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2376))

Included in the following conference series:

Abstract

We consider the problem of content search and retrieval in peer-to-peer (P2P) communities. P2P computing is a potentially powerful model for information sharing between ad hoc groups of users because of its low cost of entry and natural model for resource scaling. As P2P communities grow, however, locating information distributed across the large number of peers becomes problematic. We address this problem by adapting a state-of-the-art text-based document ranking algorithm, the vector-space model instantiated with the TFxIDF ranking rule, to the P2P environment. We make three contributions: (a) we show how to approximate TFxIDF using compact summaries of individual peers’ inverted indexes rather than the inverted index of the entire communal store; (b) we develop a heuristic for adaptively determining the set of peers that should be contacted for a query; and (c) we show that our algorithm tracks TFxIDF’s performance very closely, giving P2P communities a search and retrieval algorithm as good as that possible assuming a centralized server.

This work was supported in part by NSF grants EIA-0103722 and EIA-9986046.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422–426, 1970.

    Article  MATH  Google Scholar 

  2. S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1–7): 107–117, 1998.

    Article  Google Scholar 

  3. C. Buckley. Implementation of the SMART information retrieval system. Technical Report TR85-686, Cornell University, 1985.

    Google Scholar 

  4. J. P. Callan, Z. Lu, and W. B. Croft. Searching Distributed Collections with Inference Networks. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21–28, 1995.

    Google Scholar 

  5. I. Clarke, O. Sandberg, B. Wiley, and T. W. Hong. Freenet: A distributed anonymous information storage and retrieval system. In Workshop on Design Issues in Anonymity and Unobservability, pages 46–66, 2000.

    Google Scholar 

  6. F.M. Cuenca-Acuna, C. Peery, R. P. Martin, and T. D. Nguyen. PlanetP: Infrastructure Support for P2P Information Sharing. Technical Report DCS-TR-465, Department of Computer Science, Rutgers University, Nov. 2001.

    Google Scholar 

  7. A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, and D. Terry. Epidemic algorithms for replicated database maintenance. In Proceedings of the Sixth Annual ACM Symposium on Principles of Distributed Computing, pages 1–12, 1987.

    Google Scholar 

  8. F. Douglis, A. Feldmann, B. Krishnamurthy, and J. C. Mogul. Rate of change and other metrics: a live study of the world wide web. In USENIX Symposium on Internet Technologies and Systems, 1997.

    Google Scholar 

  9. J. C. French, A. L. Powell, J. P. Callan, C. L. Viles, T. Emmitt, K. J. Prey, and Y. Mou. Comparing the performance of database selection algorithms. In Research and Development in Information Retrieval, pages 238–245, 1999.

    Google Scholar 

  10. D. K. Gifford, P. Jouvelot, M. A. Sheldon, and J. W. O. Jr. Semantic File Systems. In Proceedings of the 13 th ACM Symposium on Operating Systems Principles, 1991.

    Google Scholar 

  11. Gnutella. http://gnutella.wego.com.

  12. L. Gravano, H. Garcia-Molina, and A. Tomasic. The effectiveness of gloss for the text database discovery problem. In Proceedings of the ACM SIGMOD Conference, pages 126–137, 1994.

    Google Scholar 

  13. D. Harman. Overview of the first trec conference. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1993.

    Google Scholar 

  14. KaZaA. http://www.kazaa.com/.

  15. J. Kubiatowicz, D. Bindel, Y. Chen, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao. Oceanstore: An architecture for global-scale persistent storage. In Proceedings of ACM ASPLOS, 2000.

    Google Scholar 

  16. Napster. http://www.napster.com.

  17. A. Oram, editor. Peer-to-Peer: Harnessing the Power of Disruptive Technologies. O’Reilly Press, 2001.

    Google Scholar 

  18. S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content addressable network. In Proceedings of the ACM SIGCOMM’ 01 Conference, 2001.

    Google Scholar 

  19. S. E. Robertson and K. S. Jones. Relevance weighting of search terms. In Journal of the American Society for Information Science, volume 27, pages 129–146, 1976.

    Article  Google Scholar 

  20. D. Roselli, J. Lorch, and T. Anderson. A comparison of file system workloads. In Proceedings of the 2000 USENIX Annual Technical Conference, June 2000.

    Google Scholar 

  21. A. Rowstron and P. Druschel. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), 2001.

    Google Scholar 

  22. G. Salton, A. Wang, and C. Yang. A vector space model for information retrieval. In Journal of the American Society for Information Science, volume 18, pages 613–620, 1975.

    MATH  Google Scholar 

  23. S. Saroiu, P. K. Gummadi, and S. D. Gribble. A measurement study of peer-to-peer file sharing systems. In Proceedings of Multimedia Computing and Networking (MMCN), 2002.

    Google Scholar 

  24. I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the ACM SIGCOMM’ 01 Conference, 2001.

    Google Scholar 

  25. I. Witten, A. Moffat, and T. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, San Francisco, second edition, 1999.

    Google Scholar 

  26. B. Yang and H. Garcia-Molina. Efficient search in peer-to-peer networks. In Proceedings of the International Conference on Distributed Computing Systems (ICDCS), July 2002.

    Google Scholar 

  27. Y Zhao, J. Kubiatowicz, and A. Joseph. Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Technical Report UCB/CSD-01-1141, University of California, Berkeley, 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cuenca-Acuna, F.M., Nguyen, T.D. (2002). Text-Based Content Search and Retrieval in Ad-hoc P2P Communities. In: Gregori, E., Cherkasova, L., Cugola, G., Panzieri, F., Picco, G.P. (eds) Web Engineering and Peer-to-Peer Computing. NETWORKING 2002. Lecture Notes in Computer Science, vol 2376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45745-3_20

Download citation

  • DOI: https://doi.org/10.1007/3-540-45745-3_20

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44177-9

  • Online ISBN: 978-3-540-45745-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics