Elsevier

Computer Networks

Volume 113, 11 February 2017, Pages 1-16
Computer Networks

Fault-tolerant bandwidth reservation strategies for data transfers in high-performance networks

https://doi.org/10.1016/j.comnet.2016.11.003Get rights and content

Abstract

Many next-generation e-science applications require fast and reliable transfer of large volumes of data with guaranteed performance, which is typically enabled by the bandwidth reservation service in high-performance networks. One prominent issue in such network environments with large footprints is that node and link failures are inevitable, hence potentially degrading the quality of data transfer. We consider two generic types of bandwidth reservation requests (BRRs) concerning data transfer reliability: (i) to achieve the highest data transfer reliability under a given data transfer deadline, and (ii) to achieve the earliest data transfer completion time while satisfying a given data transfer reliability requirement. We propose two periodic bandwidth reservation algorithms with rigorous optimality proofs to optimize the scheduling of individual BRRs within BRR batches. The efficacy of the proposed algorithms is illustrated through extensive simulations in comparison with scheduling algorithms widely adopted in production networks in terms of various performance metrics.

Introduction

Extreme-scale distributed scientific applications often need to move large volumes of data, now frequently termed as “big data”, on the order of terabytes at present and petabytes or exabytes in the near future for collaborative data analysis and knowledge discovery [1], [2], [3]. For example, the fast-camera data from the large-scale data exploration process at the Korea Superconducting Tokamak Advanced Research (KSTAR) has reached 3.9 TB in 10 s [4], which makes the existing KSTAR workflow management systems hard-pressed to keep the pace. A promising solution is to quickly move the generated data to remote collaborative sites, such as Oak Ridge National Laboratory (ORNL) and National Energy Research Scientific Computing Center (NERSC), for near-real-time data analysis before it reaches the slow disk storage, which is known as in-flight analysis (or in-transit processing). If not transferred in a timely manner, the generated data may become stale and useless, resulting in a tremendous waste of resources [2]. Hence, providing fast and reliable data transfer with guaranteed performance has become a crucial task, especially in unstable networks.

Unfortunately, the default best-effort IP network infrastructure is inadequate to handle data transfers at such scales [1]. To address this challenge, new strategies for provisioning dedicated channels have been developed in several high-performance network (HPN) initiatives, and many bandwidth reservation services have been offered to support data transfer in mission-critical applications, such as On-Demand Secure Circuits and Advance Reservation System (OSCARS) of Energy Sciences Network (ESnet) [5], and Advanced Layer 2 Service (AL2S) of Internet2 [6]. Because of the rapidly increasing data transfer demands, such bandwidth reservation services are expected to proliferate into more existing and future HPNs.

HPNs are typically comprised of various types of network devices that are deployed across large geographical areas, where the nodes and links are prone to failures. The challenges of bandwidth reservation arise from not only the performance requirements of bandwidth reservation requests (BRRs) but also their reliability requirement. A good scheduling strategy should be able to make an intelligent suggestion of data transfer options to satisfy users’ Quality of Service (QoS) requirements, and meanwhile accommodate as many BRRs as possible to maximize resource utilization. The ever-increasing network scale and user base also impose additional challenges on the complexity of scheduling algorithms for scalability.

The core of an HPN with bandwidth reservation is a control plane that is responsible for scheduling BRRs. For an incoming BRR, the control plane computes the most appropriate end-to-end network path if any, reserves necessary bandwidth for the given data transfer on above path, and releases the reserved bandwidth after the data transfer is completed [1]. The focus of our work is on the design of efficient bandwidth reservation algorithms used by the control plane to guarantee certain QoS and improve resource utilization in unreliable HPNs that are subject to node and link failures. We consider two generic types of BRRs concerning data transfer reliability with different objectives and constraints: (i) to maximize the data transfer reliability under a data transfer deadline constraint, referred to as MaxR-DC, and (ii) to minimize the data transfer completion time under a data transfer reliability constraint, referred to as MinT-RC. We use the Three-parameter Generalized Gamma Distribution (3GG) to model the node and link failures in HPNs because of its generality, practicality and flexibility in modeling failure events [7], [8]. For a batch of MaxR-DC and MinT-RC BRRs, we propose two optimal periodic bandwidth reservation algorithms, referred to as Opt-MaxR-DC and Opt-MinT-RC, each of which is proved to optimize the scheduling of individual requests. For each BRR in the batch, the proposed Opt-MaxR-DC/Opt-MinT-RC algorithm returns the reservation option with the highest data transfer reliability/the earliest data transfer completion time under the deadline/reliability constraint. To illustrate the superiority of our algorithms, we conduct a comparative performance evaluation with the algorithm currently being used in OSCARS and its slightly modified version, referred to as OSCARS-MaxR-DC and OSCARS-MinT-RC, respectively. OSCARS of ESnet is one of the most widely used bandwidth reservation service in scientific area [9]. To mimic the real HPN scenario, we gathered the real network data from ESnet and drawn its topology, and then we run extensive experiments on the simulated ESnet topology. The experimental results confirm the performance superiority of our algorithms in terms of various performance metrics.

To the best of our knowledge, we are among the first to study transfer reliability for bandwidth reservation in HPNs. All existing researches regarding bandwidth reservation/scheduling in network area are constrained within a time interval as shown in Section 2. As far as we know, we are the first to have considered the case when bandwidth reservation/scheduling is beyond the given time interval.

Section snippets

Related work

Bandwidth reservation has attracted a great deal of attention from researchers, and has been studied in various contexts and areas in the past decade. We provide below a brief survey of research efforts directly related to bandwidth reservation.

Chen and Primet define a flexible reservation framework using time-rate function algebra, and identify a series of practical reservation schemes with increasing generality and potential performance in grid networks [10]. These reservation schemes are

Mathematical models

We model an HPN as a graph G(V, E), where V and E represent the set of nodes and edges, respectively [2], [3], [9]. For illustration purposes, we provide an example HPN, whose topology and available link bandwidth table within time interval (0, 10s) are shown in Fig. 1, where V={vs,a,b,vd} and E={vsa,avd,ab,bvd}. For convenience, we tabulate the parameters used in our models in Table 1.

We consider two types of BRRs, namely MaxR-DC and MinT-RC, concerning data transfer reliability with

Algorithm design for MaxR-DC

In this section, we design the optimal algorithm for MaxR-DC, referred to as Opt-MaxR-DC, with a detailed example-based illustration. For comparison, we also present the scheduling algorithm currently being used by OSCARS with an illustration using the same example.

Algorithm design for MinT-RC

In this section, we design the optimal algorithm for MinT-RC, referred to as Opt-MinT-RC, with a detailed example-based illustration. The scheduling algorithm currently being used by OSCARS is not designed for MinT-RC BRRs. For performance comparison, we slightly modify the OSCARS scheduling algorithm to schedule MinT-RC BRRs, referred to as OSCARS-MinT-RC. We illustrate OSCARS-MinT-RC using the same example as in the illustration of Opt-MinT-RC.

Performance evaluation

OSCARS of ESnet is one of the most widely used bandwidth reservation service in scientific area. More than 40 DOE research sites, including the entire national laboratory system together with its supercomputing facilities, and 140 research and commercial networks around the world are using OSCARS to transfer scientific data [31], [32]. To mimic the real-life ESnet scenarios, we construct its topology using the real network data gathered from ESnet [2], [33]. Simulated topology of ESnet is

Conclusion and future work

The bandwidth reservation service provided by HPNs has proved to be a promising solution for big data transfer. In HPNs, node and link failures are inevitable, and such failures potentially degrade the quality of data transfer. We formulated two generic types of BRRs concerning data transfer reliability, i.e. MaxR-DC and MinT-RC, and proposed optimal algorithms, i.e. Opt-MaxR-DC and Opt-MinT-RC, for a batch of BRRs with rigorous optimality proofs. For each BRR in the batch, the proposed

Liudong Zuo received the B.E. degree in Computer Science from University of Electronic Science and Technology of China in 2009, and the Ph.D. degree in Computer Science from Southern Illinois University Carbondale in 2015. He is currently an assistant professor in Computer Science Department at California State University, Dominguez Hills. His research interests include computer networks, algorithm design and application, and big data management.

References (34)

  • LeeK.-D. et al.

    Optimization for adaptive bandwidth reservation in wireless multimedia networks

    Comput. Netw.

    (2002)
  • LinY. et al.

    Complexity analysis and algorithm design for advance bandwidth scheduling in dedicated networks

    IEEE/ACM Trans. Netw.

    (2013)
  • ZuoL. et al.

    Fast and efficient bandwidth reservation algorithms for dynamic network provisioning

    J. Netw. Syst. Manag.

    (2013)
  • ZuoL. et al.

    Concurrent bandwidth reservation strategies for big data transfers in high-performance networks

    IEEE Trans. Netw. Serv. Manag.

    (2015)
  • E. Dart et al.

    Fusion energy sciences network requirements review - final report 2014

    Proceedings of ESnet Network Requirements Workshop

    (2014)
  • N. Charbonneau et al.

    Advance reservation frameworks in hybrid ip-wdm networks

    IEEE Commun. Mag.

    (2011)
  • R. Summerhill

    The new internet2 network

    Proceedings of 6th Global Lambda Integrated Facility

    (2006)
  • E. Stacy

    A generalization of the gamma distribution

    Ann. Math. Stat.

    (1962)
  • M. Khodabin et al.

    Some properties of generalized gamma distribution

    Math. Sci.

    (2010)
  • M. Balman et al.

    A flexible reservation algorithm for advance network provisioning

    Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis

    (2010)
  • B. Chen and P. Primet, A flexible bandwidth reservation framework for bulk data transfers in grid networks, Computing...
  • S. Sahni et al.

    Bandwidth scheduling and path computation algorithms for connection-oriented networks

    Proceedings of the 6th International Conference on Network

    (2007)
  • LinY. et al.

    On design of bandwidth scheduling algorithms for multiple data transfers in dedicated networks

    Proceedings of the the 4th ACM/IEEE Symposium on Architectures for Networking and Communications

    (2008)
  • H.N. Nguyen et al.

    A novel mobility model and resource reservation strategy for multimedia leo satellite networks

    Proceedings of Wireless Communications and Networking Conference, 2002. WCNC2002

    (2002)
  • QiaoL. et al.

    Adaptive bandwidth reservation and scheduling for efficient wireless telemedicine traffic transmission

    IEEE Trans. Veh. Technol.

    (2011)
  • A. Esmailpour et al.

    Dynamic qos-based bandwidth allocation framework for broadband wireless networks

    IEEE Trans. Veh. Technol.

    (2011)
  • A. Nadembega et al.

    Mobility-prediction-aware bandwidth reservation scheme for mobile networks

    IEEE Trans. Veh. Technol.

    (2015)
  • Cited by (0)

    Liudong Zuo received the B.E. degree in Computer Science from University of Electronic Science and Technology of China in 2009, and the Ph.D. degree in Computer Science from Southern Illinois University Carbondale in 2015. He is currently an assistant professor in Computer Science Department at California State University, Dominguez Hills. His research interests include computer networks, algorithm design and application, and big data management.

    Michelle M. Zhu received her Ph.D. degree in Computer Science from Louisiana State University in 2005. She spent two years in the Computer Science and Mathematics Division at Oak Ridge National Laboratory for her Ph.D. dissertation from 2003 to 2005. She is currently an associate professor in Department of Computer Science at Montclair State University. Her research interests include parallel and distributed computing, grid and cloud computing.

    Chase Q. Wu received the B.S. degree in remote sensing and GIS from Zhejiang University, Peoples Republic of China in 1995, the M.S. degree in geomatics from Purdue University in 2000, and the Ph.D. degree in computer science from Louisiana State University in 2003. He was a research fellow in the Computer Science and Mathematics Division at Oak Ridge National Laboratory during 2003–2006. He is currently an Associate Professor with the Departmentof Computer Science at New Jersey Institute of Technology. His research interests include computer networks, remote visualization, distributed sensor networks, high performance computing, algorithms, and artificial intelligence.

    Jason Zurawski received the B.S. degree in Computer Science and Engineering from The Pennsylvania State University in 2002, and the M.S. degree in Computer and Information Science from The University of Delaware earned in 2007. He is a Science Engagement Engineer at the Energy Sciences Network (ESnet) in the Scientific Networking Division of the Computing Sciences Directorate of the Lawrence Berkeley National Laboratory. His professional interests include network monitoring and performance measurement, high performance computing, grid computing, and application development.

    View full text