Abstract
This paper describes the design and evaluation of a federated, peer-to-peer indexing system, which can be used to integrate the resources of local systems into a globally addressable index using a distributed hash table. The salient feature of the indexing systems design is the efficient dissemination of term-document indices using a combination of duplicate elimination, leaf set forwarding and conventional techniques such as aggressive index pruning, index compression, and batching. Together these indexing strategies help to reduce the number of RPC operations required to locate the nodes responsible for a section of the index, as well as the bandwidth utilization and the latency of the indexing service. Using empirical observation we evaluate the performance benefits of these cumulative optimizations and show that these design trade-offs can significantly improve indexing performance when using a distributed hash table.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bolosky, W.J., Douceur, J.R., Ely, D., Theimer, M.: Feasibility of a Serverless Distributed File System Deployed on an existing set of Desktop PCs. SIGMetrics 2000 (2000)
Rhea, S., Geels, D., Roscoe, T., Kubiatowicz, J.: Handling Churn in a DHT. Usenix 2004 (2004)
Stoica, I., Morris, R., Liben-Nowell, D., Karger, D.R., Kaashoek, M.F., Dabek, F., Balakrishnan, H.: Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications. SIGComm 2001 (2001)
Rowstron, A., Druschel, P.: Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, p. 329. Springer, Heidelberg (2001)
Reynolds, P., Vahdat, A.: Efficient Peer-to-Peer Keyword Searching. Middleware 2003 (2003)
Burkard, T.: Herodotus: A Peer-to-PeerWeb Archival System. In: Department of Electrical Engineering and Computer Science, Cambridge, Massachusetts Institute of Technology (2002)
Singh, A., Srivatsa, M., Liu, L., Miller, T.: Apoidea: A Decentralized Peer-to-Peer Architecture for Crawling the World Wide Web. SIG 2003 (2003)
Muthitacharoen, A., Chen, B., Mazières, D.: A Low-bandwidth Network File System. In: 18th SOSP (2001)
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Comm. of the ACM 13, 422–426 (1970)
Sinka, M., Corne, D.: A large benchmark dataset for web document clustering. Soft Computing Systems: Design, Management and Applications 87, 881–890 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Casey, J., Zhou, W. (2005). Reducing the Bandwidth Requirements of P2P Keyword Indexing. In: Hobbs, M., Goscinski, A.M., Zhou, W. (eds) Distributed and Parallel Computing. ICA3PP 2005. Lecture Notes in Computer Science, vol 3719. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564621_6
Download citation
DOI: https://doi.org/10.1007/11564621_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29235-7
Online ISBN: 978-3-540-32071-5
eBook Packages: Computer ScienceComputer Science (R0)