Enabling portability in advanced information-centric services over structured peer-to-peer systems
Introduction
Peer-to-peer (P2P) infrastructures have a growing remarkable role in nowadays solutions. We can see examples in several odd scenarios: in the cloud computing (Amazon uses Dynamo), in publish/subscribe services (such as Scribe on Pastry) or bibliographical sources (OverCite employs Chord). The motivation behind this P2P-enhanced technologies is to provide scalable and efficient distributed solutions, suitable for thousands or millions of users.
To make all nodes cooperate in a P2P network, lots of P2P protocols provided several ways of node inter-communication and coordination. According to the way nodes are interconnected, they are roughly classified into two categories: unstructured and structured. However, structured peer-to-peer networks (SPNs) (Aberer et al., 2003, Ratnasamy et al., 2001, Rowstron and Druschel, 2001, Stoica et al., 2001) demonstrated their better qualities against unstructured ones, such as efficiency, low communication cost or search correctness—i.e., if an object exists in the system, it is found. This motivated an extensive use of SPNs as infrastructure where to develop services and end-user applications.
The basic services that SPNs are able to provide (i.e., put/get) are not enough for up to date complex applications though. In particular, end-user applications necessitate high level services to successfully address the application purposes. Broadly speaking, we can organize services into two classes: data management and content distribution services. To put some examples on data management services, k-nearest neighbor (kNN) queries are an elementary part of document indexing applications (Tang et al., 2003) and range or window queries are the basis for geographical information services (Kovacevic et al., 2007). Conversely, publish/subscribe services for many-to-many communication, either in the topic-based (Castro et al., 2002a) or content-based (Anceaume et al., 2006) model, are services that distribute content among nodes. Each kind of service poses different challenges when developed onto P2P infrastructures, such as efficient service provisioning and promptly query resolution.
The last, but not the least, property that current services have to address is the support of complex data domains. Application data domains are uneven from each other. For instance, the data domain of a document indexing application is the set of keywords with which documents are described (Cacheda et al., 2005); image databases use real-valued feature vectors of length F to identify images, where each feature characterises a property from the image (Kao, 2001). Since most of the SPNs utilize a uni-dimensional numerical keyspace, this poses a structural challenge and must be tackled within the proposed solution.
From a system viewpoint, we can classify existing solutions into one out of these two most common approaches:
- •
Constructing an ad-hoc P2P infrastructure to support explicitly the application data domain (Banaei-Kashani and Shahabi, 2004, Bharambe et al., 2004, Harvey et al., 2003, Kovacevic et al., 2007, Zhang et al., 2004).
- •
Adapting the data domain in order to map a multi-dimensional data object to the uni-dimensional key of the SPN (Cai et al., 2003, Datta et al., 2005, Shu et al., 2005).
This data adaptation is performed by linearization functions, like space-filling curves (SFC) (Sagan, 1994), order-preserving hash functions (OPHF) or locality-preserving hash functions (LPHF). According to the specific context, each solution selected one kind of function to fulfill the application requisites (e.g., systems using SFCs, Shu et al., 2005, OPHF, Datta et al., 2005 or LPHF, Cai et al., 2003). However, all these solutions are designed to be efficient in the specified context, what brings a set of shortages to light:
- •
Ad-hoc solutions: The vast majority of the current P2P systems address a single problem in a given scenario, which leads to lack of genericity of the whole solution. In addition, most of these systems construct a solution relying on specific P2P networks.
- •
Single service provisioning: Except in very few cases (like Zhang et al., 2004), systems do not provide multiple services in the same solution.
- •
Non-portability: Whenever the proposed solution is tight to a given application data domain or a specific P2P network, the resulting system becomes non-portable to other applications and/or other P2P networks. Therefore, services should be re-designed and re-implemented onto additional P2P systems.
- •
Maintenance: When new services are deployed as new (overlays of) overlay networks (like Scribe on Pastry), systems suffers from duplicated signalling traffic (i.e., traffic in any of the existing overlays). This kind of traffic comes to maintain the overlay's properties, such as network connectivity or fan-out.
Unlike previous solutions, our work aims to fill in an existing gap in the field of P2P middleware with a generic solution. In particular, the contributions of this paper are detailed as follows:
- 1.
Generic middleware: Our solution is flexible and extensible to add new services at any time, which automatically become available to end-user applications. In addition, this middleware is SPN-generic, so that it can be deployed in most of the existing SPNs. This is a necessary property to ensure service portability and reuse.
- 2.
Application-independent: Our middleware provides a common data structure to services and applications of any kind, which makes them independent of the underlying SPN (and of the SPN keyspace). To do so, our middleware automates transparently the adaptation of the application data domain to the SPN keyspace. Our adaptation technique is customized for each application and unifies the accessing model to nodes in the SPN.
- 3.
Data multi-dimensionality: Modern applications work with complex data domains, commonly multi-dimensional. Our middleware successfully addresses this challenge, demonstrating better performance on higher dimensionalities. In addition provides the necessary algorithms to services to be successfully and efficiently deployed into our solution.
- 4.
Low maintenance: Our approach relies on the rendezvous model of the SPNs to rule all supported services. Therefore, our middleware requires only a SPN to work properly. In other words, we do not construct additional overlay networks for different services.
As a proof of concept, we detail in this work three use cases, supported by different P2P networks, which will demonstrate the middleware feasibility, portability, as well as the efficiency and scalability of our approach.
The rest of article is structured as follows. Section 2 presents how existing solutions address the provisioning of new high-level information-centric services. In Section 3 we introduce our particular approach to provide a portable and scalable middleware. Our middleware is composed of three components, which are introduced in 4 Data adaptation module, 5 Data management module, 6 Content distribution module. They also present the three use cases that will help to demonstrate the feasibility of our approach. In Section 7 we will evaluate the performance of the three use cases through significant simulation settings. We conclude the article in Section 8 with the final remarks.
Section snippets
Related work and background
We introduce in this section the related work from an architectural viewpoint, as well as from portability and scalability point of view.
Middleware overview
The motivation behind our middleware is to provide the expected genericity with the underlying SPN, as well as for the kind of featuring services. To this end, we have designed a three-layer infrastructure, in stark contrast to the tight two-layer organization as shown in Section 2.1. To illustrate, see the Fig. 2. From a top-down reading, we firstly observe the application layer, our middleware in the middle layer, and the SPN layer in the bottom layer. However, to clarify the life cycle of
Data adaptation module
Before describing our module, let us introduce some necessary notation. SPNs can be defined by a set of nodes that cooperate together to provide access to a set of objects (i.e., any kind of information being stored by users). The access is provided by means of a keyspace and the function , which constructs the node identifier, and an adaptation function , which builds a representative key for the given data object. This way, these functions establish the assignment of objects
Data management module
In this module we locate two services for data management. The idea behind these services is to provide a store + lookup model. In other words, applications store their information into the system and, then, they use these services to pull data from the system.
We have designed storage and search algorithms to provide efficiently search services for a wide range of applications. Particularly, our similarity query services (SQS for short) support exact match and range searches for
Content distribution module
Content distribution services can be seen as mechanisms to bring information to interested participants in a push mode. Broadly speaking, participating nodes indicate their interests (subscriptions) on specific kind of data to the system. When new data (events) matches to the given interests, the corresponding nodes receive a notification including the matching data objects. Application-level multicast and publish/subscribe techniques embody this kind of services. In our case, we provide CAPS, a
Evaluation
In this section we depict the evaluation of the results of all three services conducted by simulation. These results will coincide with the theoretical analysis shown in the following lines, and that all they, by means of our data adaptation technique and our middleware infrastructure, turn efficient and scalable.
Conclusions
This work was motivated by the lack of genericity on the way information-centric services, such as data management services (e.g., range queries or spatial queries), or content distribution services (e.g., publish/subscribe services), are currently designed and deployed into structured peer-to-peer networks (SPNs). These existing solutions suffer from the following shortages: maintenance overhead, lack of generic solutions, non-portable services and applications, and lack of structural support
Acknowledgements
Authors would like to thank to anonymous reviewers for their constructive comments and suggestions, which have greatly contributed to improve the quality of the original manuscript.
References (33)
- et al.
A case study of distributed information retrieval architectures to index one terabyte of text
Information Processing and Management: An International Journal
(2005) - et al.
psearch: information retrieval in structured overlays
ACM SIGCOMM Computer Communication Review
(2003) - et al.
P-grid: a self-organizing structured p2p system
SIGMOD Record
(2003) - Anceaume E, Gradinariu M, Datta AK, Simon G, Virgillito A. A semantic overlay for self-* peer-to-peer...
- Aspnes J, Shah G. Skip graphs. In: Proceedings of 14th annual ACM-SIAM symposium on discrete algorithms, 2003. p....
- Baldoni R, Marchetti C, Virgillito A, Vitenberg R. Content-based publish-subscribe over structured overlay networks....
- Banaei-Kashani F, Shahabi C. Swam: a family of access methods for similarity-search in peer-to-peer data networks. In:...
- et al.
Mercury: supporting scalable multi-attribute range queries
ACM SIGCOMM Computer Communications Review
(2004) - Cai M, Frank M, Chen J, Szekely P. MAAN: a multi-attribute addressable network for grid information services. In:...
- et al.
Scribe: a large-scale and decentralized application-level multicast infrastructure
IEEE Journal on Selected Areas in Communications
(2002)