Semantic routing of search queries in P2P networks

https://doi.org/10.1016/j.jpdc.2008.06.005Get rights and content

Abstract

Peer-to-peer (P2P) networks are beginning to form the infrastructure of future applications. Heavy network traffic limits the scalability of P2P networks. Indexing is a method to reduce this traffic. But indexes tend to become large with the growth of the network. Also, limiting the size of these indexes causes loss of indexing information. In this paper we introduce a novel ontology based index (OI) which limits the size of the indexes without sacrificing indexing information. We show that the method can be employed by many P2P networks. The OI sits on top of routing and maintenance modules of a P2P network and enhances it. The OI prunes branches of search trees which have no chance to proceed to a response. Also the OI guarantees that an enhanced routing algorithm and its basic version have the same result set for a given search query. This means that the OI reduces traffic without reducing quality of service. To measure the performance of the OI we apply it on Chord (DHT based) and HyperCup (non-DHT based) P2P networks and show that it reduces the networks’ traffic significantly.

Introduction

Peer-to-peer (P2P) overlay networks are growing rapidly, and, unlike current applications like Gnutella [21] and Morpheus [32] that are an infrastructure for file sharing, they are developing into an infrastructure for different applications [56], [17], [25], [3]. With the advent of the semantic web [5] and semantic web services [30], many researchers and industry developers use these protocols in P2P networks. P2P networks are characterized by self-organization, symmetric communication and distributed control [44]. This self-organizing network automatically rearranges itself to joining and leaving nodes. By symmetric communication it is meant that all nodes are both servers and clients. Distributed control means that they do not have a centralized server or servers [45]. Routing is one of the important problems in P2P networks and it divides these networks into two categories: structured and unstructured [31]. Messages in unstructured networks are flooded throughout the entire network but in structured networks the messages are routed so that they pass only a certain number of hops (lookup). The earlier versions of Gnutella [21] were unstructured and Chord [53], CAN [37], SkipNet [24], Coral [18], TOPLUS [20] and HyperCup [48] are examples of structured P2P networks. In unstructured P2P networks and some structured P2P networks like HyperCup [48] which use broadcasting to search, a large volume of traffic is created during the search process. Also because of limitation of key lookup to perform arbitrary queries, multicast and broadcast are proposed to be added to structured (especially DHT based) P2P networks [13]. Therefore a large volume of traffic is also a big problem of structured P2P networks which tend to support complex queries [13], [23], [52], [38].

Indexing methods were introduced to help the message routing process and reduce the traffic yielded by broadcast algorithms [58], [9], [40]. Indexing can be perceived as the heart of P2P search methods. It captures a broad range of issues, as demonstrated by the search/index links model [8], [40]. A P2P index can be local, centralized or distributed. With a local index, a peer only keeps the references to its own data. In a centralized index, a single server keeps references to data on many peers, and with distributed indexes, pointers towards the target reside at several nodes [40].

Centralized schemes have the disadvantage of a single point of failure. About 36 million Napster [33] users lost their service because the single administration was vulnerable to the legal challenges of record companies [29]. Also local and distributed indexes tend to become large with the growth of the network because these indexes grow with respect to the objects present in the network. So as more objects are added to the network these indexes grow in size. Also, by limiting the size of these indexes our search scope is altered in a way that some object can not be reached anymore by these indexes. In this paper we will introduce an ontology based indexing algorithm which addresses the mentioned problems by using a shared upper ontology as a reference for building indexes.

We also, as examples, apply it on HyperCup and Chord networks to enhance them in terms of overlay network traffic.

The rest of this paper is organized as follows. In Section 2 we overview some related works and compare our work with them. Section 3 includes our proposed ontology based index (OI), its data structure and routing algorithm as well as the mechanism of ontology sharing in an OI based P2P network and the mechanism of resource and query annotation. In Section 4 we explain the implementation of the OI on HyperCup and in Section 5 the implementation of the OI on Chord network is discussed. In Section 6, after presenting our simulation methodology, we analyse the performance of the OI on HyperCup and Chord networks. Then in Section 7, we compare the performance of the OI with another related work. Finally, in Section 8 we conclude our study.

Section snippets

Related work

Our work can be classified as semantic annotation (meta-data) of resources and queries as well as an indexing approach. From the perspective of semantic annotation, the following researches share the same goal.

In [26] a review of P2Ps using meta-data is done. They considered a number of possible types for meta-data: document hash (an ID generated from the document contents via some hashing algorithm), document ID (an ID assigned arbitrarily to a document, according to some scheme), statistical

Ontology based index

In this section we explain our ontology based index (OI). The OI contains a distributed data structure and a routing algorithm which will be described in the following subsections.

Implementation of the OI on HyperCup

HyperCup is a symmetric overlay, based on a hypercube. It consists of a structure, a broadcast algorithm and a management algorithm for joining and departing nodes. HyperCup, by limiting the number of links between nodes, presented a broadcast algorithm that with the least number of sent messages guarantees all nodes will receive the message. Also, the distance between any two nodes (diameter of the graph) is at most at a logarithmic order, and each node holds the information of at most log2n

Implementation of the OI on Chord

In this section we overview the Chord network because we will modify it and implement our method on it. Chord [53] provides an efficient method of locating documents while placing few constraints on the applications that use it. Chord uses consistent hashing [27] to map nodes onto an m-bit circular identifier space. In particular, each identifier a is mapped to the node with the least identifier greater or equal to a in the circular identifier space. This node is called the successor of a. To

Experimental results

In this section, first we describe our experiment methodology. Then we analyse the experimental results in terms of the effect of different parameters on the effectiveness of the OI.

Comparison of the OI performance with another related work

In this section we compare the OI with a well cited work reported in [49]. We select this work for comparison because they use a shared ontology and use a hypercube as the overlay. So the effectiveness of our work can be compared with their proposal using the same ontology. To do so, we simulate both the OI and [49] with the same ontology and configuration. In the rest of this section we name the work by [49] Ontology based HyperCup (OH).

Fig. 16 shows that the percentage of search traffic

Conclusion

In this paper we introduced an ontology based indexing method (OI) which, in addition to having the advantages of traditional indexing methods, does not have some important disadvantages of them. We showed that the OI is a general method that can be implemented on many P2P networks and can be a base for future developments in this area. As case studies, we implemented the OI on Hypercup and Chord peer-to-peer networks: Chord is a DHT based P2P network but Hypercup is not DHT based. Then we

Habib Rostami received his B.S. degree in Computer Engineering from Sharif University of Technology in 2001, his M.S. degree in Computer Science in 2004, and his Ph.D. in Computer Engineering in 2008, from the same university. His research interests include P2P and Grid computing, distributed systems, semantic networks, and combinatorial optimization.

References (61)

  • T. Berners-Lee, J. Hendler, O. Lassila, The semantic web: A new form of web content that is meaningful to computers...
  • J. Broekstra, P. Haase, F.V. Harmelen, M. Menken, P. Mika, B. Schnizler, R. Siebes, Bibster-a semantics-based...
  • W. Cohen et al.

    Computing least common sub-sumer in description logics

  • B. Cooper, H. Garcia-Molina, Ad hoc, self-supervising peer-to-peer search networks. Technical Report, Gatech...
  • A. Crespo, H. Garcia-Molina, Routing indices for peer-to-peer systems, in: Proc. 28th Conference on Distributed...
  • F. Dabek, E. Brunskill, M.F. Kaashoek, D. Karger, R. Morris, I. Stoica, H. Balakrishnan, Building peer-to-peer systems...
  • M. Denny, Ontology building: A survey of editing tools. http://www.xml.com/pub/a/2002/11/06/ontologies.html....
  • M. Ehrig, C. Tempich, J. Broekstra, F.V. Harmelen, M. Sabou, R. Siebes, S. Staab, H. Stuckenschmidt, Swap —...
  • S. El-Ansary, L.O. Alima, P. rand, S. Haridi, Efficient broadcast in structured P2P networks, in: IPTPS’03, Feb....
  • R. Ferreira et al.

    Semantic indexing in structured peer-to-peer networks

    Journal of Parallel and Distributed Computing

    (2007)
  • A.A. Fisk, Gnutella ultrapeer query routing v0.1....
  • I. Foster et al.

    The anatomy of the grid: Enabling scalable virtual organization

    The International Journal of High Performance Computing Applications

    (March/April 2001)
  • M. Freedman, D. Mazieres, Sloppy hashing and self-organizing clusters, in: Proc. 2nd International Workshop on...
  • P. Garc et al.

    PlanetSim: A new overlay network simulation framework

  • L. Garces-Erice, K. Ross, E. Biersack, P. Felber, G. Urvoy-Keller, Topology-centric lookup service, in: Proc. The 5th...
  • Gnutella website....
  • M. Harren, M. Hellerstein, R. Huebsch, B.T. Loo, Complex queries in DHT-based peer-to-peer networks, in: The 1st...
  • N.J.A. Harvey, M.B. Jones, S. Saroiu, M. Theimer, A. Wolman, Skipnet: A scalable overlay network with practical...
  • K. Hwang et al.

    DHT-based security infrastructure for trusted internet and grid computing

    International Journal of Critical Infrastructures

    (2006)
  • S. Joseph et al.

    Decentralized meta-data strategies: Effective peer-to-peer search

    IEICE Transactions on Communications

    (2003)
  • Cited by (0)

    Habib Rostami received his B.S. degree in Computer Engineering from Sharif University of Technology in 2001, his M.S. degree in Computer Science in 2004, and his Ph.D. in Computer Engineering in 2008, from the same university. His research interests include P2P and Grid computing, distributed systems, semantic networks, and combinatorial optimization.

    Jafar Habibi received his B.S. degree in Computer Engineering from the Supreme School of Computer, his M.S. degree in Industrial Engineering from Tarbiat Modares University, and his Ph.D. degree in Computer Engineering from Manchester University. At present, he is an associate professor in the Computer Engineering Department at Sharif University of Technology. He is a supervisor of Sharif’s RoboCup Simulation Group. His research interests are mainly in the areas of computer engineering, simulation systems, MIS, DSS, and evaluation of computer systems performance.

    Emad Livani received his B.S. and M.S. degrees in Computer Engineering from Sharif University of Technology. His research interests include ontology merging, computer networks, sensor networks, and simulation.

    This research was in part supported by a grant from IPM (Institute for Studies in Theoretical Physics and Mathematics).

    View full text