skip to main content
10.1145/1183579.1183582acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Efficient peer-to-peer semantic overlay networks based on statistical language models

Published: 11 November 2006 Publication History

Abstract

In this paper we address the query routing problem in peer-to-peer (P2P) information retrieval. Our system builds up on the idea of a Semantic Overlay Network (SON), in which each peer becomes neighbor of a small number of peers, chosen among those that are most similar to it. Peers in the network are represented by a statistical Language Model derived from their local data collections but, instead of using the non-metric Kullback-Leibler divergence to compute the similarity between them, we use a symmetrized and "metricized" related measure, the square root of the Jensen-Shannon divergence, which let us map the problem to a metric search problem. The search strategy exploits the triangular inequality to efficiently prune the search space and relies on a priority queue to visit the most promising peers first. To keep communications costs low and to perform an efficient comparison between Language Models, we devise a compression technique that builds on Bloom-filters and histograms and we provide error bounds for the approximation and a cost analysis for the algorithms used to build and maintain the SON.

References

[1]
K. Aberer and P. Cudré-Mauroux. Semantic overlay networks. In VLDB Tutorial, page 1367, Aug. 2005.
[2]
W.-T. Balke, W. Nejdl, W. Siberski, and U. Thaden. DL meets P2P - distributed document retrieval based on classification and content. In ECDL, pages 379--390, Sep. 2005.
[3]
M. Batko, C. Gennaro, and P. Zezula. A scalable nearest neighbor search in p2p systems. In DBISP2P, pages 79--92, Aug. 2004.
[4]
M. Batko, D. Novk, F. Falchi, and P. Zezula. On scalability of the similarity search in the world of peers. In INFOSCALE, May 2006.
[5]
M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer. Improving collection selection with overlap awareness in P2P search engines. In SIGIR, pages 67--74, Aug. 2005.
[6]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, July 1970.
[7]
A. Broder and M. Mitzenmacher. Network applications of bloom filters: a survey. Internet Mathematics, 1(4):485--509, 2003.
[8]
J. P. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In SIGIR, pages 21--28, July 1995.
[9]
T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & sons, 1991.
[10]
A. Crespo and H. Garcia-Molina. Semantic overlay networks for P2P systems. In AP2PC, pages 1--13, July 2004.
[11]
C. Doulkeridis, K. Noervaag, and M. Vazirgiannis. Scalable semantic overlay generation for P2P-based digital libraries. In ECDL, Sep. 2006.
[12]
C. Doulkeridis, K. Noervaag, and M. Vazirgiannis. The SOWES approach to P2P web search using semantic overlays. In WWW, pages 1027--1028, May 2006.
[13]
D. M. Endres and J. E. Schindelin. A new metric for probability distributions. IEEE Trans. Inf. Theory, 49(7):1858--1860, July 2003.
[14]
F. Falchi, C. Gennaro, and P. Zezula. A content-addressable network for similarity search in metric spaces. In DBISP2P, pages 126--137, Aug. 2005.
[15]
P. B. Gibbons, Y. Matias, and V. Poosala. Fast incremental maintenance of approximate histograms. ACM Trans. Database Syst., 27(3):261--298, Sep. 2002.
[16]
L. Gravano, H. Garcia-Molina, and A.Tomasic. GlOSS: Text-source discovery over the internet. ACM Trans. Database Syst., 24(2):229--264, June 1999.
[17]
G. R. Hjaltason and H. Samet. Index-driven similarity search in metric spaces. ACM Trans. Database Syst., 28(4):517--580, Dec. 2003.
[18]
P. Kalnis, W. S. Ng, B. C. Ooi, and K.-L. Tan. Answering similarity queries in peer-to-peer networks. Inf. Syst., 31(1):57--72, Mar. 2006.
[19]
A. C. König and G. Weikum. Automatic tuning of data synopses. Inf. Syst., 28(1-2):85--109, Mar. 2003.
[20]
X. Liu and W. B. Croft. Statistical language modeling for information retrieval. Annual Review of Information Science and Technology, 39:3--31, 2005.
[21]
A. Löser, C. Tempich, B. Quilitz, W.-T. Balke, S. Staab, and W. Nejdl. Searching dynamic communities with personal indexes. In ISWC, pages 491--505, Nov. 2005.
[22]
J. Lu and J. Callan. Federated search of text-based digital libraries in hierarchical peer-to-peer networks. In ECIR, pages 52--66, Mar. 2005.
[23]
J. Lu and J. Callan. User modeling for full-text federated search in peer-to-peer networks. In SIGIR, aug 2006.
[24]
J. Lu and J. P. Callan. Content-based retrieval in hybrid peer-to-peer networks. In CIKM, pages 199--206, Nov. 2003.
[25]
P. Mahlmann and C. Schindelhauer. Peer-to-peer networks based on random transformations of connected regular undirected graphs. In SPAA, pages 155--164, July 2005.
[26]
P. Mahlmann and C. Schindelhauer. Distributed random digraph transformations for peer-to-peer networks. In SPAA, Aug. 2006.
[27]
W. Meng, C. T. Yu, and K.-L. Liu. Building efficient and effective metasearch engines. ACM Comput. Surv., 34(1):48--89, Mar. 2002.
[28]
S. Michel, M. Bender, P. Triantafillou, and G. Weikum. Iqn routing: Integrating quality and novelty in P2P querying and ranking. In EDBT, pages 149--166, Mar. 2006.
[29]
H. Nottelmann and N. Fuhr. Combining CORI and the decision-theoretic approach for advanced resource selection. In ECIR, pages 138--153, Apr. 2004.
[30]
H. Nottelmann and N. Fuhr. Comparing different architectures for query routing in peer-to-peer networks. In ECIR, pages 253--264, Apr. 2006.
[31]
J. X. Parreira, S. Michel, and G. Weikum. p2pDating: Real life inspired semantic overlay networks for web search. In SIGIR workshop on Heterogeneous and Distributed Information Retrieval, aug 2005.
[32]
I. Podnar, M. Rajman, T. Luu, F. Klemm, and K. Aberer. Beyond term indexing: A P2P framework for web information retrieval. Informatica, 30(2):153--161, June 2006.
[33]
R. Steinmetz and K. Wehrle. Peer-to-Peer Systems and Applications. Springer, 2005.
[34]
P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer-Verlag New York, Inc., 2005.
[35]
C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to information retrieval ACM Trans. Inf. Syst., 22(2):179--214, Apr. 2004.

Cited By

View all
  • (2013)Centrality-based peer rewiring in semantic overlay networks: Short paperIEEE 7th International Conference on Research Challenges in Information Science (RCIS)10.1109/RCIS.2013.6577720(1-6)Online publication date: May-2013
  • (2011)A Dynamic Cluster Construction Method Based on Characteristics of Query Generation in Peer-to-Peer NetworksProceedings of the 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications10.1109/WAINA.2011.53(96-101)Online publication date: 22-Mar-2011
  • (2011)SoFAExpert Systems with Applications: An International Journal10.1016/j.eswa.2010.06.02038:1(94-105)Online publication date: 1-Jan-2011
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
P2PIR '06: Proceedings of the international workshop on Information retrieval in peer-to-peer networks
November 2006
66 pages
ISBN:1595935274
DOI:10.1145/1183579
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. language models
  2. metric space
  3. nearest neighbor search
  4. peer-to-peer
  5. semantic overlay network

Qualifiers

  • Article

Conference

CIKM06
Sponsor:
CIKM06: Conference on Information and Knowledge Management
November 11, 2006
Virginia, Arlington, USA

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2013)Centrality-based peer rewiring in semantic overlay networks: Short paperIEEE 7th International Conference on Research Challenges in Information Science (RCIS)10.1109/RCIS.2013.6577720(1-6)Online publication date: May-2013
  • (2011)A Dynamic Cluster Construction Method Based on Characteristics of Query Generation in Peer-to-Peer NetworksProceedings of the 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications10.1109/WAINA.2011.53(96-101)Online publication date: 22-Mar-2011
  • (2011)SoFAExpert Systems with Applications: An International Journal10.1016/j.eswa.2010.06.02038:1(94-105)Online publication date: 1-Jan-2011
  • (2010)Feedback-Based Performance Tuning for Self-Organizing Multimedia Retrieval SystemsProceedings of the 2010 Second International Conferences on Advances in Multimedia10.1109/MMEDIA.2010.19(102-108)Online publication date: 13-Jun-2010
  • (2010)On Building a Self-Organizing Search System for Multimedia Retrieval2010 5th International Conference on Future Information Technology10.1109/FUTURETECH.2010.5482652(1-7)Online publication date: May-2010
  • (2010)Leveraging Semantic Approximations in Heterogeneous XML Data Sharing Networks: The SUNRISE ApproachSoft Computing in XML Data Management10.1007/978-3-642-14010-5_12(315-350)Online publication date: 2010
  • (2009)Query Routing Mechanisms in Self-Organizing Search SystemsProceedings of the 2009 Second International Workshop on Similarity Search and Applications10.1109/SISAP.2009.13(132-139)Online publication date: 29-Aug-2009
  • (2009)A Dynamic Cluster Construction Method Based on Query Characteristics in Peer-to-Peer NetworksProceedings of the 2009 First International Conference on Advances in P2P Systems10.1109/AP2PS.2009.34(168-173)Online publication date: 11-Oct-2009
  • (2009)Distributed Semantic Overlay NetworksHandbook of Peer-to-Peer Networking10.1007/978-0-387-09751-0_17(463-494)Online publication date: 15-Oct-2009
  • (2008)Peer-to-peer similarity search over widely distributed document collectionsProceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval10.1145/1458469.1458477(35-42)Online publication date: 30-Oct-2008
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media