Enabling portability in advanced information-centric services over structured peer-to-peer systems

https://doi.org/10.1016/j.jnca.2010.03.006Get rights and content

Abstract

A common factor among all the existing distributed, peer-to-peer systems is their lack of genericity. Typically, information-centric services (such as range queries) are deployed ad-hoc onto a specific peer-to-peer overlay. These kinds of solutions make them probably efficient but non-portable to other peer-to-peer infrastructures, and so the services and applications constructed over them. We do believe, instead, that a peer-to-peer-generic solution is feasible. In this paper, we tackle the genericity and portability issue specifically on structured peer-to-peer networks (SPNs).

To do so, we introduce a distributed 3-layer architecture, which abstracts applications (on top of the architecture) and the peer-to-peer network currently in use (in the bottom layer). Our middleware appears in the middle layer, which is responsible to address two major challenges: (i) supporting complex, multi-dimensional application data domains and (ii) performing efficiently for a wide variety of information-centric services in the large scale.

Broadly speaking, information-centric services are classified as data management (such as range or spatial queries) and content distribution services (like publish/subscribe), and our middleware is an umbrella for all them. Notice that data management services are based on the pull mode (i.e., a user lookups information previously stored in the system), whilst content distribution services obey to a push mode (i.e., the system delivers the information timely to users).

The benefits of our approach are clear: (i) Our middleware can be easily deployed over existing SPNs, guaranteeing the portability of a critical mass of services and end-user applications; (ii) Several services can be added to the middleware, which will facilitate the appearance of new synergies; and (iii) our middleware deals with the application data domain transparently to services and applications, including the necessary algorithms for services to be efficiently deployed into our middleware.

Introduction

Peer-to-peer (P2P) infrastructures have a growing remarkable role in nowadays solutions. We can see examples in several odd scenarios: in the cloud computing (Amazon uses Dynamo), in publish/subscribe services (such as Scribe on Pastry) or bibliographical sources (OverCite employs Chord). The motivation behind this P2P-enhanced technologies is to provide scalable and efficient distributed solutions, suitable for thousands or millions of users.

To make all nodes cooperate in a P2P network, lots of P2P protocols provided several ways of node inter-communication and coordination. According to the way nodes are interconnected, they are roughly classified into two categories: unstructured and structured. However, structured peer-to-peer networks (SPNs) (Aberer et al., 2003, Ratnasamy et al., 2001, Rowstron and Druschel, 2001, Stoica et al., 2001) demonstrated their better qualities against unstructured ones, such as efficiency, low communication cost or search correctness—i.e., if an object exists in the system, it is found. This motivated an extensive use of SPNs as infrastructure where to develop services and end-user applications.

The basic services that SPNs are able to provide (i.e., put/get) are not enough for up to date complex applications though. In particular, end-user applications necessitate high level services to successfully address the application purposes. Broadly speaking, we can organize services into two classes: data management and content distribution services. To put some examples on data management services, k-nearest neighbor (kNN) queries are an elementary part of document indexing applications (Tang et al., 2003) and range or window queries are the basis for geographical information services (Kovacevic et al., 2007). Conversely, publish/subscribe services for many-to-many communication, either in the topic-based (Castro et al., 2002a) or content-based (Anceaume et al., 2006) model, are services that distribute content among nodes. Each kind of service poses different challenges when developed onto P2P infrastructures, such as efficient service provisioning and promptly query resolution.

The last, but not the least, property that current services have to address is the support of complex data domains. Application data domains are uneven from each other. For instance, the data domain of a document indexing application is the set of keywords with which documents are described (Cacheda et al., 2005); image databases use real-valued feature vectors of length F to identify images, where each feature characterises a property from the image (Kao, 2001). Since most of the SPNs utilize a uni-dimensional numerical keyspace, this poses a structural challenge and must be tackled within the proposed solution.

From a system viewpoint, we can classify existing solutions into one out of these two most common approaches:

  • Constructing an ad-hoc P2P infrastructure to support explicitly the application data domain (Banaei-Kashani and Shahabi, 2004, Bharambe et al., 2004, Harvey et al., 2003, Kovacevic et al., 2007, Zhang et al., 2004).

  • Adapting the data domain in order to map a multi-dimensional data object to the uni-dimensional key of the SPN (Cai et al., 2003, Datta et al., 2005, Shu et al., 2005).

This data adaptation is performed by linearization functions, like space-filling curves (SFC) (Sagan, 1994), order-preserving hash functions (OPHF) or locality-preserving hash functions (LPHF). According to the specific context, each solution selected one kind of function to fulfill the application requisites (e.g., systems using SFCs, Shu et al., 2005, OPHF, Datta et al., 2005 or LPHF, Cai et al., 2003). However, all these solutions are designed to be efficient in the specified context, what brings a set of shortages to light:

  • Ad-hoc solutions: The vast majority of the current P2P systems address a single problem in a given scenario, which leads to lack of genericity of the whole solution. In addition, most of these systems construct a solution relying on specific P2P networks.

  • Single service provisioning: Except in very few cases (like Zhang et al., 2004), systems do not provide multiple services in the same solution.

  • Non-portability: Whenever the proposed solution is tight to a given application data domain or a specific P2P network, the resulting system becomes non-portable to other applications and/or other P2P networks. Therefore, services should be re-designed and re-implemented onto additional P2P systems.

  • Maintenance: When new services are deployed as new (overlays of) overlay networks (like Scribe on Pastry), systems suffers from duplicated signalling traffic (i.e., traffic in any of the existing overlays). This kind of traffic comes to maintain the overlay's properties, such as network connectivity or fan-out.

Unlike previous solutions, our work aims to fill in an existing gap in the field of P2P middleware with a generic solution. In particular, the contributions of this paper are detailed as follows:

  • 1.

    Generic middleware: Our solution is flexible and extensible to add new services at any time, which automatically become available to end-user applications. In addition, this middleware is SPN-generic, so that it can be deployed in most of the existing SPNs. This is a necessary property to ensure service portability and reuse.

  • 2.

    Application-independent: Our middleware provides a common data structure to services and applications of any kind, which makes them independent of the underlying SPN (and of the SPN keyspace). To do so, our middleware automates transparently the adaptation of the application data domain to the SPN keyspace. Our adaptation technique is customized for each application and unifies the accessing model to nodes in the SPN.

  • 3.

    Data multi-dimensionality: Modern applications work with complex data domains, commonly multi-dimensional. Our middleware successfully addresses this challenge, demonstrating better performance on higher dimensionalities. In addition provides the necessary algorithms to services to be successfully and efficiently deployed into our solution.

  • 4.

    Low maintenance: Our approach relies on the rendezvous model of the SPNs to rule all supported services. Therefore, our middleware requires only a SPN to work properly. In other words, we do not construct additional overlay networks for different services.

As a proof of concept, we detail in this work three use cases, supported by different P2P networks, which will demonstrate the middleware feasibility, portability, as well as the efficiency and scalability of our approach.

The rest of article is structured as follows. Section 2 presents how existing solutions address the provisioning of new high-level information-centric services. In Section 3 we introduce our particular approach to provide a portable and scalable middleware. Our middleware is composed of three components, which are introduced in 4 Data adaptation module, 5 Data management module, 6 Content distribution module. They also present the three use cases that will help to demonstrate the feasibility of our approach. In Section 7 we will evaluate the performance of the three use cases through significant simulation settings. We conclude the article in Section 8 with the final remarks.

Section snippets

Related work and background

We introduce in this section the related work from an architectural viewpoint, as well as from portability and scalability point of view.

Middleware overview

The motivation behind our middleware is to provide the expected genericity with the underlying SPN, as well as for the kind of featuring services. To this end, we have designed a three-layer infrastructure, in stark contrast to the tight two-layer organization as shown in Section 2.1. To illustrate, see the Fig. 2. From a top-down reading, we firstly observe the application layer, our middleware in the middle layer, and the SPN layer in the bottom layer. However, to clarify the life cycle of

Data adaptation module

Before describing our module, let us introduce some necessary notation. SPNs can be defined by a set of nodes P that cooperate together to provide access to a set of objects O (i.e., any kind of information being stored by users). The access is provided by means of a keyspace I and the function FP:PI, which constructs the node identifier, and an adaptation function FO:OI, which builds a representative key for the given data object. This way, these functions establish the assignment of objects

Data management module

In this module we locate two services for data management. The idea behind these services is to provide a store + lookup model. In other words, applications store their information into the system and, then, they use these services to pull data from the system.

We have designed storage and search algorithms to provide efficiently search services for a wide range of applications. Particularly, our similarity query services (SQS for short) support exact match and range searches for

Content distribution module

Content distribution services can be seen as mechanisms to bring information to interested participants in a push mode. Broadly speaking, participating nodes indicate their interests (subscriptions) on specific kind of data to the system. When new data (events) matches to the given interests, the corresponding nodes receive a notification including the matching data objects. Application-level multicast and publish/subscribe techniques embody this kind of services. In our case, we provide CAPS, a

Evaluation

In this section we depict the evaluation of the results of all three services conducted by simulation. These results will coincide with the theoretical analysis shown in the following lines, and that all they, by means of our data adaptation technique and our middleware infrastructure, turn efficient and scalable.

Conclusions

This work was motivated by the lack of genericity on the way information-centric services, such as data management services (e.g., range queries or spatial queries), or content distribution services (e.g., publish/subscribe services), are currently designed and deployed into structured peer-to-peer networks (SPNs). These existing solutions suffer from the following shortages: maintenance overhead, lack of generic solutions, non-portable services and applications, and lack of structural support

Acknowledgements

Authors would like to thank to anonymous reviewers for their constructive comments and suggestions, which have greatly contributed to improve the quality of the original manuscript.

References (33)

  • F. Cacheda et al.

    A case study of distributed information retrieval architectures to index one terabyte of text

    Information Processing and Management: An International Journal

    (2005)
  • C. Tang et al.

    psearch: information retrieval in structured overlays

    ACM SIGCOMM Computer Communication Review

    (2003)
  • K. Aberer et al.

    P-grid: a self-organizing structured p2p system

    SIGMOD Record

    (2003)
  • Anceaume E, Gradinariu M, Datta AK, Simon G, Virgillito A. A semantic overlay for self-* peer-to-peer...
  • Aspnes J, Shah G. Skip graphs. In: Proceedings of 14th annual ACM-SIAM symposium on discrete algorithms, 2003. p....
  • Baldoni R, Marchetti C, Virgillito A, Vitenberg R. Content-based publish-subscribe over structured overlay networks....
  • Banaei-Kashani F, Shahabi C. Swam: a family of access methods for similarity-search in peer-to-peer data networks. In:...
  • A.R. Bharambe et al.

    Mercury: supporting scalable multi-attribute range queries

    ACM SIGCOMM Computer Communications Review

    (2004)
  • Cai M, Frank M, Chen J, Szekely P. MAAN: a multi-attribute addressable network for grid information services. In:...
  • M. Castro et al.

    Scribe: a large-scale and decentralized application-level multicast infrastructure

    IEEE Journal on Selected Areas in Communications

    (2002)
  • Castro M, Druschel P, Kermarrec A.-M, Rowstron A. One ring to rule them all: service discovery and binding in...
  • Dabek F, Zhao BY, Druschel P, Kubiatowicz J, Stoica I. Towards a common api for structured peer-to -peer overlays. In:...
  • Datta A, Hauswirth M, John R, Schmidt R, Aberer K. Range queries in trie-structured overlays. In: Proceedings of the...
  • Druschel P, Rowstron A. Past: a large-scale, persistent peer-to-peer storage utility. In: Proceedings of the...
  • El-Ansary S, Alima L, Brand P, Haridi S. Efficient broadcast in structured p2p networks. In: Proceedings of second...
  • Harvey N, Jones MB, Saroiu S, Theimer M, Wolman A. Skipnet: a scalable overlay network with practical locality...
  • Cited by (0)

    View full text