Abstract
Recent advances in peer to peer (P2P) search algorithms have presented viable structured and unstructured approaches for full-text search. We posit that these existing approaches are each best suited for different types of queries. We present PHIRST, the first system to facilitate effective full-text search within P2P networks. PHIRST works by effectively leveraging between the relative strengths of these approaches. Similar to structured approaches, agents first publish terms within their stored documents. However, frequent terms are quickly identified and not exhaustively stored, resulting in a significantly reduction in the system’s storage requirements. During query lookup, agents use unstructured searches to compensate for the lack of fully published terms. Additionally, they explicitly weigh between the costs involved with structured and unstructured approaches, allowing for a significant reduction in query costs. We evaluated the effectiveness of our approach using both real-world and artificial queries. We found that in most situations our approach yields near perfect recall. We discuss the limitations of our system, as well as possible compensatory strategies.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chawathe, Y., Ratnasamy, S., Breslau, L., Lanham, N., Shenker, S.: Making gnutella-like p2p systems scalable. In: SIGCOMM 2003, pp. 407–418 (2003)
Gravano, L., García-Molina, H., Tomasic, A.: Gloss: text-source discovery over the internet. ACM Trans. Database Syst. 24(2), 229–264 (1999)
Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Proceedings of ICML 1997, pp. 143–151 (1997)
Joung, Y.-J., Fang, C.-T., Yang, L.-W.: Keyword search in dht-based peer-to-peer networks. In: ICDCS 2005, pp. 339–348. IEEE Computer Society, Los Alamitos (2005)
Li, J., Loo, B., Hellerstein, J., Kaashoek, F., Karger, D., Morris, R.: On the feasibility of peer-to-peer web indexing and search. In: IPTPS. 2nd International Workshop on Peer-to-Peer Systems (2003)
Loo, B.T., Hellerstein, J.M., Huebsch, R., Shenker, S., Stoica, I.: Enhancing p2p file-sharing with an internet-scale query processor. In: Proceedings of VLDB, pp. 432–443 (2004)
Loo, B.T., Huebsch, R., Stoica, I., Hellerstein, J.M.: The case for a hybrid p2p search infrastructure. In: Voelker, G.M., Shenker, S. (eds.) IPTPS 2004. LNCS, vol. 3279, p. 2. Springer, Heidelberg (2005)
Lv, Q., Cao, P., Cohen, E., Li, K., Shenker, S.: Search and replication in unstructured peer-to-peer networks. In: ICS 2002, pp. 84–95 (2002)
Morris, R., Karger, D., Kaashoek, F., Balakrishnan, H.: Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. In: ACM SIGCOMM 2001, pp. 149–160 (2001)
Paice, C.D.: Another stemmer. SIGIR Forum 24(3), 56–61 (1990)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: EMNLP 2002, pp. 79–86 (2002)
Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Middleware, pp. 21–40 (2003)
Kubiatowicz, J.: Handling churn in a DHT. In: USENIX 2004, pp. 127–140 (2004)
Zhao, B.Y., Huang, L., Stribling, J., Rhea, S.C., Joseph, A.D., Kubiatowicz, J.D.: Tapestry: a resilient global-scale overlay for service deployment. IEEE Journal on Selected Areas in Communications 22(1), 41–53 (2004)
Yang, Y., Dunlap, R., Rexroad, M., Cooper, B.F.: Performance of full text search in structured and unstructured peer-to-peer systems. In: IEEE INFOCOM (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rosenfeld, A., Goldman, C.V., Kaminka, G.A., Kraus, S. (2007). An Architecture for Hybrid P2P Free-Text Search. In: Klusch, M., Hindriks, K.V., Papazoglou, M.P., Sterling, L. (eds) Cooperative Information Agents XI. CIA 2007. Lecture Notes in Computer Science(), vol 4676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75119-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-75119-9_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75118-2
Online ISBN: 978-3-540-75119-9
eBook Packages: Computer ScienceComputer Science (R0)