On the Usage of Global Document Occurrences in Peer-to-Peer Information Systems

Papapetrou, Odysseas; Michel, Sebastian; Bender, Matthias; Weikum, Gerhard

doi:10.1007/11575771_21

Odysseas Papapetrou¹⁸,
Sebastian Michel¹⁸,
Matthias Bender¹⁸ &
…
Gerhard Weikum¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3760))

Included in the following conference series:

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems"

1257 Accesses
2 Citations

Abstract

There exist a number of approaches for query processing in Peer-to-Peer information systems that efficiently retrieve relevant information from distributed peers. However, very few of them take into consideration the overlap between peers: as the most popular resources (e.g., documents or files) are often present at most of the peers, a large fraction of the documents eventually received by the query initiator are duplicates. We develop a technique based on the notion of global document occurrences (GDO) that, when processing a query, penalizes frequent documents increasingly as more and more peers contribute their local results. We argue that the additional effort to create and maintain the GDO information is reasonably low, as the necessary information can be piggybacked onto the existing communication. Early experiments indicate that our approach significantly decreases the number of peers that have to be involved in a query to reach a certain level of recall and, thus, decreases user-perceived latency and the wastage of network resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Experimental Study on Semi-structured Peer-to-Peer Information Retrieval Network

Decentralized Indexing over a Network of RDF Peers

Resource discovery mechanisms in pure unstructured peer-to-peer systems: a comprehensive survey

Article 26 November 2020

References

Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the ACM SIGCOMM 2001, pp. 149–160. ACM Press, New York (2001)
Google Scholar
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable content-addressable network. In: Proceedings of ACM SIGCOMM 2001, pp. 161–172. ACM Press, New York (2001)
Google Scholar
Rowstron, A., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)
Chapter Google Scholar
Buchmann, E., Böhm, K.: How to Run Experiments with Large Peer-to-Peer Data Structures. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium, Santa Fe, USA (2004)
Google Scholar
Aberer, K., Punceva, M., Hauswirth, M., Schmidt, R.: Improving data access in p2p systems. IEEE Internet Computing 6, 58–67 (2002)
Article Google Scholar
Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, San Francisco (2002)
Google Scholar
Fuhr, N.: A decision-theoretic approach to database selection in networked IR. ACM Transactions on Information Systems 17, 229–249 (1999)
Article Google Scholar
Gravano, L., Garcia-Molina, H., Tomasic, A.: Gloss: text-source discovery over the internet. ACM Trans. Database Syst. 24, 229–264 (1999)
Article Google Scholar
Si, L., Jin, R., Callan, J., Ogilvie, P.: A language modeling framework for resource selection and results merging. In: Proceedings of CIKM 2002, pp. 391–397. ACM Press, New York (2002)
Chapter Google Scholar
Xu, J., Croft, W.B.: Cluster-based language models for distributed retrieval. In: Research and Development in Information Retrieval, pp. 254–261 (1999)
Google Scholar
Callan, J.: Distributed information retrieval. In: Advances in information retrieval, pp. 127–150. Kluwer Academic Publishers, Dordrecht (2000)
Google Scholar
Nottelmann, H., Fuhr, N.: Evaluating different methods of estimating retrieval quality for resource selection. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 290–297. ACM Press, New York (2003)
Chapter Google Scholar
Grabs, T., Böhm, K., Schek, H.J.: Powerdb-ir: information retrieval on top of a database cluster. In: Proceedings of CIKM 2001, pp. 411–418. ACM Press, New York (2001)
Chapter Google Scholar
Melnik, S., Raghavan, S., Yang, B., Garcia-Molina, H.: Building a distributed full-text index for the web. ACM Trans. Inf. Syst. 19, 217–241 (2001)
Article Google Scholar
Byers, J., Considine, J., Mitzenmacher, M., Rost, S.: Informed content delivery across adaptive overlay networks. In: Proceedings of ACM SIGCOMM (2002)
Google Scholar
Ganguly, S., Garofalakis, M., Rastogi, R.: Processing set expressions over continuous update streams. In: SIGMOD 2003: Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp. 265–276. ACM Press, New York (2003)
Chapter Google Scholar
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 422–426 (1970)
Article MATH Google Scholar
Mitzenmacher, M.: Compressed bloom filters. IEEE/ACM Trans. Netw. 10, 604–612 (2002)
Article Google Scholar
Florescu, D., Koller, D., Levy, A.Y.: Using probabilistic information in data integration. The VLDB Journal, 216–225 (1997)
Google Scholar
Zhang, Y., Callan, J., Minka, T.: Novelty and redundancy detection in adaptive filtering. In: SIGIR 2002: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 81–88. ACM Press, New York (2002)
Chapter Google Scholar
Nie, Z., Kambhampati, S., Hernandez, T.: Bibfinder/statminer: Effectively mining and using coverage and overlap statistics in data integration. In: VLDB, pp. 1097–1100 (2003)
Google Scholar
Hernandez, T., Kambhampati, S.: Improving text collection selection with coverage and overlap statistics. pc-recommended poster. In: WWW (2005), Full version available at http://rakaposhi.eas.asu.edu/thomas-www05-long.pdf
Bender, M., Michel, S., Triantafillou, P., Weikum, G., Zimmer, C.: Improving collection selection with overlap awareness in p2p systems. In: Proceedings of the SIGIR Conference (2005)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Croft, W.B., Lafferty, J.: Language Modeling for Information Retrieval. Kluwer International Series on Information Retrieval, vol. 13 (2003)
Google Scholar
Bender, M., Michel, S., Weikum, G., Zimmer, C.: The MINERVA project: Database selection in the context of P2P search. In: BTW 2005 (2005)
Google Scholar
Bender, M., Michel, S., Weikum, G., Zimmer, C.: Minerva: Collaborative p2p search. In: Proceedings of the VLDB Conference (Demonstration) (2005)
Google Scholar
Bender, M., Michel, S., Weikum, G., Zimmer, C.: Bookmark-driven query routing in peer-to-peer web search. In: Callan, J., Fuhr, N., Nejdl, W. (eds.) Proceedings of the SIGIR Workshop on Peer-to-Peer Information Retrieval, pp. 46–57 (2004)
Google Scholar
Buckley, C., Salton, G., Allan, J.: The effect of adding relevance information in a relevance feedback environment. In: SIGIR. Springer, Heidelberg (1994)
Google Scholar
Luxenburger, J., Weikum, G.: Query-log based authority analysis for web information search. In: Zhou, X., Su, S., Papazoglou, M.P., Orlowska, M.E., Jeffery, K. (eds.) WISE 2004. LNCS, vol. 3306, pp. 90–101. Springer, Heidelberg (2004)
Chapter Google Scholar
Srivastava, J., et al.: Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations 1, 12–23 (2000)
Article Google Scholar
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Symposium on Principles of Database Systems (2001)
Google Scholar
Nepal, S., Ramakrishna, M.V.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)
Google Scholar
Guntzer, U., Balke, W.T., Kiesling, W.: Optimizing multi-feature queries for image databases. The VLDB Journal, 419–428 (2000)
Google Scholar
Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. VLDB (2004)
Google Scholar
Zipf, G.K.: Human behavior and the principle of least effort. Addison-wesley press, Reading (1949)
Google Scholar

Download references

Author information

Authors and Affiliations

Max-Planck Institut für Informatik, 66123, Saarbrücken, Germany
Odysseas Papapetrou, Sebastian Michel, Matthias Bender & Gerhard Weikum

Authors

Odysseas Papapetrou
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Michel
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Bender
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Weikum
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

STARLab, Vrije Universiteit Brussel (VUB), Bldg G/10, Pleinlaan 2, 1050, Brussels, Belgium
Robert Meersman
School of Computer Science and Information Technology, RMIT University, Bld 10.10, 376-392 Swanston Street, 3001, Melbourne, VIC, Australia
Zahir Tari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Papapetrou, O., Michel, S., Bender, M., Weikum, G. (2005). On the Usage of Global Document Occurrences in Peer-to-Peer Information Systems. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE. OTM 2005. Lecture Notes in Computer Science, vol 3760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11575771_21

Download citation

DOI: https://doi.org/10.1007/11575771_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29736-9
Online ISBN: 978-3-540-32116-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics