AFT: Adaptive and fault tolerant peer-to-peer overlay—A user-centric solution for data sharing

https://doi.org/10.1016/j.future.2016.05.022Get rights and content

Highlights

  • AFT: User-centric network for data-sharing.

  • Fault Tolerance and Adaptive Peer-to-Peer overlay.

  • Comparison of different Peer-to-Peer overlays.

  • Performance evaluation of network construction.

Abstract

The widespread of interconnectable computers gives systems the chance to operate more efficiently, by better utilizing the cooperation between individual components. User-centric solutions address the devices themselves and, since there is no network infrastructure and a device powerful enough to assume the role of a coordinator, adopting a peer-to-peer model tends to be the best solution. In this paper we propose AFT, an overlay that adapts to a changing number of nodes, is resilient to faults and is the foundation for an efficient implementation of a reputation based trust system. The AFT overlay is designed to be a solution for systems that need to share transient information, performing a synchronization between various components, like in mobile ad-hoc networks, M2M networks, urban networks, and wireless sensor networks. The operations supported by the overlay, like joining, leaving, unicast transmission, broadcast sharing and maintenance can be accomplished in a duration belonging to O(N), where N is the number of nodes which are part of the structure. We proved these properties and we evaluate the time performance related to overlay creation and node joining.

Introduction

With the rapid adoption of ubiquitous computing, smart devices and computers tend to appear everywhere, under various forms ranging from powerful servers, to smart-phones, and to lightweight sensors. The fact that they are so many and scattered all over the place means that we can increase their usefulness and efficiency if the systems manage to communicate with each other. The challenge becomes even greater as we take into consideration the fact that computers may misbehave, acting in a way that destabilizes the entire system. Such a situation can occur as a result of a malfunction or it can be what the device was actually created for.

While the previous motivation for the use of a distributed system comes naturally, as the components are weakly linked and can enter or exit the network at any given time, there are also other ways to get to a solution based on this type of system. The fact that single computer architectures and even centralized systems scale up to a limit generates demand for alternatives. The mentioned limit may not be an important factor at the beginning, when the number of users is low, but as time passes and more participants come into play it becomes an impediment to growth  [1].

A centralized system has both performance issues and is also more susceptible to failures. Basically, we face a trade-off between simplicity and fault tolerance. In a purely distributed system the role of a given note can be fulfilled by any other peer. This change may generate other modification in the structure of the network (for example, in the overlay-specific naming of the nodes, or for interactive broadcasting  [2]), but after running a specialized procedure the absence of the initial peer does not affect the system any more.

For all applications based on content distribution networks, the decentralization over different P2P overlays improves the efficiency. P2P models and technologies have been successfully used for on-line communities, where users post contents (video, images, text) in order to share them. In these communities, similarity search is supported by semantic overlay networks, where data management, data integration and documents retrieval systems play an important role. An open issue in very large-scale P2P systems refers to how to chose a representative answer for a user query, where there may be a huge number of answers most of them being uninteresting or redundant  [3], [4].

Because applications that require large processing power and scalability vary very much, a system that sits on top of the traditional networking technologies, but below the application layer is needed. Reducing the problem to its bare essentials, what we need is a manner of organizing devices such that they can cooperate with low penalties, that adapts to changes and that enables fast data sharing  [5]. The first and third previously mentioned properties should be analysed relative to centralized models or, by using the transitivity of the better than relation, to other distributed systems that have proven to be successful at managing given issues. Other challenges for Peer-to-Peer overlays are related to interoperability for mobile nodes  [6], optimized of throughput  [7], and performance modelling and evaluation  [8], [9].

The above gives rise to the challenge we address in this study, namely: we seek to organize the computers in a decentralized self-organizing overlay network that can handle failing devices. Such a structure should also adapt to a varying number of peers and assign an equal importance to both unicast and broadcast data traffic. While this will not directly deal with bad intentioned peers, it will provide support to adequate solutions. This is AFT, and Adaptive and Fault-Tolerant peer-to-peer overlay, with nodes virtually linked in a torus, which is designed to be a user-centric solution for data dissemination and sharing. As an example, we can augment the network with a reputation-based trust layer that will filter out disturbing devices. One of the requirement for such a trust system is that the layers below it have to provide a quick manner in which nodes can share some data with all other peers and learn from them at the same time  [10].

To clarify why the previously mentioned property is needed, we analysed the Secured Trust model  [11]. As in many other systems that rely on the ability to compute the reputation of its peers, from time to time this property is recomputed for each node. The way in which this is done is the same as the manner in which we would do it in real life, when talking about the reputation of a person: we would take into account the feedback of others after they interacted with the subject.

At certain moments in time, nodes can synchronize in the matter of how they perceived interactions with the peers they came in contact with. This leads us precisely to the requirement we stated previously: nodes should be able to send and receive data to, respective from, other peers without wasting much time.

The need to handle devices that may, at some point, fail is a key feature of any system that wants to face real-life scenarios. Hardware tends to misbehave after being used a certain period of time and such an unfortunate event should not take down the entire network, either by causing healthy nodes to misbehave or simply by degrading performance to the point of un-usability.

The same is true for adaptability: the overlay should modify its structure depending on the number of nodes that compose it, in order to operate efficiently in the given situation. This also means that performances should degrade as little as possible when increasing the number of peers.

To summarize, the main contributions of this paper are as follows.

  • 1.

    AFT, a user-centric network for data-sharing, structured as a torus, having joining, leaving, unicast transmission, broadcast sharing and maintenance operations with O(N) cost.

  • 2.

    A fault tolerance and adaptive Peer-to-Peer overlay with a specific consolidation procedure, a resilience scheme and a mechanism for contacting a ring from the overlay.

  • 3.

    Comparison of proposed overlay with two other Peer-to-Peer overlays: Chord and HoneyComb.

  • 4.

    Detailed performance evaluation of network construction and theoretical evaluation of operations cost for the proposed overlay.

This rest of the paper is structured in 7 sections as follows. Section  2 presents the related work and existing solutions. Section  3 gives a general description of the proposed design, leaving the next two sections, Section  4 and Section  5, to present the details regarding the important characteristics of the overlay: adaptability and fault tolerance. Aside from formally proving the properties of our solution, in Section  6 we present experimental measurements that reassert these features. Section  7 reveals a comparison between AFT and other Peer-to-Peer overlays. The conclusions and future development ideas are drawn in the final part of the paper, in Section  8.

Section snippets

Related work

Many designs have been proposed for solving problems related to organizing nodes in a Peer-to-Peer infrastructure, each of them providing a set of characteristics meant to simplify, or even perform, duties that would have otherwise burden upper layers.

The Content Addressable Network  [12], or CAN, for short, is a structure that wants to ensure scalability, fault tolerance and self-organization capabilities by creating a multi-dimensional Cartesian coordinate space. Its intended purpose is to

Overlay structural layout

The overlay is structured as a torus. We call a “ring” a circle (composed of peers) that is perpendicular on the plane of the inner circle and a “chain” a circle that is in a parallel plane relative to the inner circle.

For a structure containing N peers, there should be exactly p peers on each ring, aside from the last ring, which can have fewer peers. The value of p is given by: p=N.

Each node should only exchange data with peers adjacent to it in the same ring or chain. Still, a node should

Adaptability of AFT overlay

As we can see from the definition of p (see Eq. (1)), the overlay’s structure changes depending on the number of peers that compose the network. Thus, the number of peers per ring is incremented when the network contains p complete rings, another 2 complete rings and a node. Similarly, the number of peers per ring is decremented when the overlay contains p2 complete rings and a node. We can summarize this as follows: pt+1={pt+1if  N=pp+2p+1pt1if  N=pp2p+1ptotherwise .

Eq. (2) denotes 2

Fault tolerance of AFT overlay

One of the main objectives of the AFT overlay is to provide support for cases in which nodes become unresponsive. Also, in the assumption that a trust layer is implemented on top of the overlay, a mechanism that enables it to expel a peer from the system is needed in order to maintain the trust of all nodes in a given range. The same thing can be applied if we want to keep in the overlay only nodes that achieve certain values for some given performance metrics.

All the previous issues boil down

Implementation details

For the simulation environment used to obtain experimental results for the AFT overlay, we decided to go with OverSim  [21]. As they mention on their website,1 OverSim is based on OMNeT++  [22] and intends to be an overlay and peer-to-peer network simulation framework.

Because the nodes that represent individual terminals in the simulation environment have a structure based on a multi-tiered module hierarchy, we only had to implement the overlay specific module. The

Comparison with other overlays

When proposing a new solution in any field of research, it is always a good idea to analyse how it compares to other similar solutions. Such a process should not always yield a clear winner, as many times we come to the conclusion that this is dependent of the considered situation. This is true as an environment is influenced by a lot of parameters and not all of them have a positive influence on the resources consumed by every system.

A good outcome is one that involves finding elements, that

Conclusions and future work

The AFT overlay is a way to organize a network which has the primary purpose of enabling real-time data sharing between its members. The peers are organized in a torus shaped structure, being positioned at the intersection between a ring and a chain. Devices fill the empty spots ring by ring and measures are taken so that the overlay is kept this way every time a node leaves its place.

Adaptability is a key feature of AFT and comes from the number of nodes per ring, p: as the network grows or

Acknowledgements

This work was supported by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS - UEFISCDI, project number PN-II-RU-TE-2014-4-2731 - DataWay: Real-time Data Processing Platform for Smart Cities: Making sense of Big Data.

Andrei Poenaru, Computer Science diplomat engineer of University Politehnica of Bucharest, is an active member of Distributed Systems Laboratory. His research interests are in Peer-to-Peer systems and Cloud Computing, especially in data management and data sharing in Peer-to-Peer, Cloud middleware tools, distributed applications design and implementation. He did an internship at Dropbox in 2015 and is currently pursuing a Masters Degree in Distributed Systems at ETH Zurich.

References (23)

  • E. Barbierato et al.

    A performance modeling language for big data architectures

  • Cited by (0)

    Andrei Poenaru, Computer Science diplomat engineer of University Politehnica of Bucharest, is an active member of Distributed Systems Laboratory. His research interests are in Peer-to-Peer systems and Cloud Computing, especially in data management and data sharing in Peer-to-Peer, Cloud middleware tools, distributed applications design and implementation. He did an internship at Dropbox in 2015 and is currently pursuing a Masters Degree in Distributed Systems at ETH Zurich.

    Roxana Istrate, Computer Science diplomat engineer of University Politehnica of Bucharest, is an active member of Distributed Systems Laboratory. Her research interests are in Cloud System and Many Task Computing, especially in resource-aware scheduling, multi-criteria optimization, Cloud middleware tools, advance reservation, distributed applications design and implementation. She attended an internship at IBM Research, Zürich in summer 2015 and is currently pursuing a Ph.D. Degree in Cognitive Computing.

    Florin Pop, Ph.D. Habil., diplomat engineer, is associate professor within the Computer Science Department in University Politehnica of Bucharest and an active member of Distributed System Laboratory. His general research interests are: Large-Scale Distributed Systems (design and performance), Grid Computing and Cloud Computing, Peer-to-Peer Systems, Big Data Management, Data Aggregation, Information Retrieval and Ranking Techniques, Bio-Inspired Optimization Methods.

    View full text