On the self-organization of a hybrid peer-to-peer system

https://doi.org/10.1016/j.jnca.2009.08.002Get rights and content

Abstract

Decentralized peer-to-peer (P2P) systems can be classified into unstructured and structured. The former is easy to implement, and often simply uses flooding for search, which can be effective only when target objects are popular or nearby. The latter requires peers to cooperate closely to maintain an overlay topology so as to ensure an efficient routing path between any two nodes. Recently, a hybrid use of both paradigms has gained its popularity in several popular file sharing tools to take advantage of each. What is lacking, and thus the purpose of the paper, is a fully decentralized algorithm to build such hybrid systems, as existing methods often require human intervention and some centralized gateway to select peers and guide them to build the structured overlay. The challenges include how to ensure that only one connected overlay is constructed in the lack of any global knowledge, and that only stable peers are selected for the structured overlay so as to reduce its maintenance cost. In addition, the construction must be efficient, scalable, robust, and easy to implement in a highly dynamic environment.

Introduction

According to the way overlay networks are organized, existing P2P systems can be roughly classified into four families: (decentralized) unstructured, (decentralized) structured, partially centralized, and hybrid systems.

In unstructured P2P systems (e.g., Gnutella), peers connect to one another at will, as the choice of neighbors is irrelevant to a correct functioning of the major operations (e.g., search) in the systems. Object placement is basically also arbitrary. Due to lack of a mechanism to gather object information and/or to infer correlation between objects and the topology, a blind search process like flooding is inevitable in order to find the interested objects. The flexibility of the topology makes them adapt better to highly dynamic network environments. The blind search process essentially imposes no restrictions on queries, thereby allowing complex search like keyword search or range query to be conducted as well. However, flooding costs too much and is unscalable. Techniques like iterative deepening, directed BFS, routing indices (Yang and Garcia-Molina, 2002), probability indexing (Cheng and Joung, 2006), and random walk (Lv et al., 2002) have all increased scalability and search performance of unstructured P2P systems, but improvements are often limited when retrieving rare and distant objects.

In contrast, structured P2P systems (e.g., CAN, Ratnasamy et al., 2001 and Chord, Stoica et al., 2001) have a well-structured overlay network topology to assist routing and object placement. Among them, distributed hash tables (DHTs) have emerged as the most popular scheme in this family. In DHTs, each peer acts as a hash table bucket of a globally agreed hash function. Objects are inserted into the network with a unique key. Search in DHTs is a guaranteed and efficient process. Peers take only a small number of messages, which is typically logarithmic of the overlay network size, to locate target objects. DHTs, however, suffer from two fatal problems—robustness and search flexibility. This is because efficient and effective structured P2P systems rely on close cooperation between peers to maintain their somewhat inflexible topologies, but P2P participants are often unstable and unreliable. In addition, because hashing functions wipe out most information of objects, DHTs must find other ways to perform complex search. Extra data overlays have been proposed to facilitate complex search (Reynolds and Vahdat, 2003, Andrzejak and Xu, 2002) but they cost considerable overhead.

In partially centralized systems (e.g., Napster and recent versions of Gnutella), pre-established or elected super peers provide centralized services, typically object indexing for ordinary peers. The use of centralized mechanisms greatly reduces system complexity, and also enables flexible and efficient search. These systems, however, are also vulnerable to attacks if the number of super peers are relatively small. Even if the number is large, the lack of a well-cooperative network structure among super peers will reduce their overall utilization and limit their further development into an Internet-scale system.

The final family, hybrid systems, tends to simultaneously adopt various approaches in other families to complement drawbacks of others. For instance, flooding and DHT-based approaches are good, respectively, at searching popular and rare objects. So systems could be made much more efficient by adopting different searching approaches according to the object popularity. This observation has indeed been supported by some live measurements from the Gnutella workload (Loo et al., 2004), and adopted by many popular file sharing tools, e.g., eMule, RevConnect, Overnet, and BitComet.

The synergy of the hybrid systems can be boosted by taking network heterogeneity into account. It is well-understood that P2P nodes have quite different characteristics, e.g., uptime and processing power (Saroiu et al., 2002). Moreover, unstructured networks are more resilient to churns than their structured counterparts. By leveraging node heterogeneity, we can make the structured overlay more robust by selecting only powerful and stable nodes to serve in the overlay (Joung and Wang, 2007).

In this paper we add another system, called Envoy (see Fig. 1), to the hybrid family. Our design philosophy is in concord with the above observations. First, we use an unstructured P2P network as our base because of its simplicity in maintenance, robustness in dynamic environments, flexibility and efficiency in search of popular objects, and, most importantly, being practically available. Then we build a DHT overlay over the base to assist search of distant and rare objects, thereby increasing scalability. Search in Envoy can thus be operated in two modes: flexible search in the unstructured overlay, and guaranteed key-based search in the structured overlay.

The focus of the paper will be on the construction of the overlay, as an efficient and self-organizing mechanism for constructing the Envoy structured overlay without the use of any centralized mechanism and global knowledge is nontrivial and, to our knowledge, has not been proposed in the literature. The challenges include how to automatically elect super peers to form the structured overlay so that the elected super peers are stable enough to reduce the maintenance cost (that is generally high for structured P2P networks), and powerful enough to provide services for the peers that elect it. Other challenges lie in avoiding concurrently elected super peers to form disjointed overlays in the absence of any centralized bootstrap server, and in making the algorithm efficient, scalable, robust, and easy to implement.

Section snippets

Related work

There have been several approaches to cope with network heterogeneity. The most popular way is to cluster peers, and select a super peer in each cluster as a local server to manage the cluster as well as to index objects in the cluster. Intra-cluster communication and lookup can therefore be efficiently done via the super peer of a cluster. The super peers also form an overlay to facilitate inter-cluster communication. The overlay is typically unstructured, e.g., KaZaA, Gia (Chawathe et al.,

Overview

Envoy is composed of two overlays: structured on the top and unstructured at the bottom (see Fig. 1). Logically, the structured overlay is constructed by peers selected from the unstructured one. However, both overlays may be constructed simultaneously, or we could also defer construction of the structured overlay network after the unstructured one is carried out. The former applies to the case where Envoy is built from scratch as a new P2P network, while the latter applies to the case where

Analysis

In this section we give some formal evaluations to the Envoy construction. We use N to denote the total number of peers and F¯ to denote the average faction size. N/F¯ thus represents the number of factions in the system.

The fundamental operation in the construction is random walk. Basically, if random walk is used to search a network, and each node has equal probability p to possess the search target, then the success rate of the random walk search is1-(1-p)Cwhere C is the search coverage

Simulation results

In this section we evaluate Envoy via simulation. As the main point of the paper is on the self-organization of a hybrid P2P system, we focus on the performance evaluation of the construction. For search performance in an Envoy-like hybrid system, one may refer to Loo et al. (2004) for some preliminary results. Moreover, since to our knowledge this is the first such construction to appear in the literature, there is no algorithm for us to compare with. Still, we have designed our simulation so

Conclusions and future work

We have presented Envoy, a two-layer P2P network where a structured overlay is built on top of an unstructured one. The purpose of using the two-layer architecture is to combine the advantage of each structure and create synergy. For example, it is known that unstructured P2P overlay is easy to build and maintain, and is quite effective in searching popular and nearby objects as well as in handling complex search. Structured overlay, on the other hand, guarantees every search to be completed in

Acknowledgment

The authors would like to thank the anonymous referees of for their invaluable comments and suggestions.

References (28)

  • Kostoulas D, Psaltoulis D, Gupta I, Birman K, Demers A. Decentralized schemes for size estimation in large and dynamic...
  • D. Liben-Nowell et al.

    Analysis of the evolution of peer-to-peer systems

  • D. Liben-Nowell et al.

    Observations on the dynamic evolution of peer-to-peer networks

  • Loo BT, Huebsch R, Stoica I, Hellerstein JM. The case for a hybrid P2P search infrastructure. In: Proceedings of the...
  • Cited by (0)

    This research is supported in part by the National Science Council, Taipei, Taiwan, Grants NSC 95-2221-E-002-058 and NSC 96-2628-E-002-026-MY3.

    View full text