Distributed meta-scheduling in lambda grids by means of Ant Colony Optimization

https://doi.org/10.1016/j.future.2016.04.005Get rights and content

Highlights

  • Ant Colony Optimization is a viable framework for meta-scheduling in lambda grids.

  • An integrated control plane for co-allocating grid and networking resources is shown.

  • Three different resource co-allocation algorithms are proposed and evaluated.

  • RSVP-TE signaling protocol is extended to support advance reservation of resources.

  • Local scheduling and information aggregation affect meta-scheduling performance.

Abstract

Data transport infrastructures based on wavelength-routed optical networks are appealing to the grid community because they can provide a reconfigurable and high-bandwidth network service to emerging data-intensive grid applications. Traditionally, the control plane of those optical networks is independent from the management of other resources, such as computing systems, and it is commonly limited to fulfill immediate reservation demands of lightpaths. However, the lack of network integration to grid resources and the absence of advance reservation of bandwidth are critical aspects to the widespread adoption of optical networks in grid environments. Therefore, in this work, a distributed grid meta-scheduler based on an Ant Colony Optimization (ACO) algorithm is proposed, which is capable of co-allocating both optical networking and grid resources, under an integrated, extended and distributed control plane that supports advance reservations. The blocking probability and the delay for starting the processing of a request are evaluated for the proposed meta-scheduler under different resource co-allocation algorithms and different meta- and local scheduling policies, including the influence of the information aggregation across grid nodes. In addition, the benefits of the proposed approach are shown, such as grid and networking resource integration at the control plane, and capital expenditure reductions at the deployed optical network when compared to an immediate reservation scenario.

Introduction

Meta-scheduling is the process of scheduling applications across different sites by orchestrating pool of resources within each local scheduler  [1]. This local scheduler is commonly referred as Local Resource Management System (LRMS), since it manages the local resources. Those resources are made transparently available to its users by using network services often supported by commodity Internet, which provides a best-effort transport service.

However, some data-intensive grid applications, such as large scale scientific experiments, do require a dedicated transport infrastructure with large bandwidth associated to strict levels of Quality of Service (QoS) and predictable times, which can be provided by wavelength-routed optical networks  [1], [2]. When computing resources of a grid are interconnected by an optical network that allows its applications or its meta-scheduler to dynamically request lightpaths on-demand, the grid is commonly referred as a lambda grid  [1], [2].

In order to fulfill the needs of task requests demanded by grid applications, the grid meta-scheduler has to assure that both computing and networking resources are available at appropriate times by reserving those resources. Since a computing resource can be used only after the setup of a lightpath connection is guaranteed to connect it to the grid application, both computing and networking resources have to be co-allocated and then reserved by the meta-scheduler  [2].

Reservation of resources in grids typically fall into two categories: immediate reservations (IR) and advance reservations (AR)  [2]. The use of resources starts immediately upon the admission of an immediate reservation demand while it is delayed until a future time when an advance reservation is admitted. Note that allowing for advance reservation in a grid environment improves the performance of the scheduling process  [3]. However, advance reservations make the meta-scheduling process significantly more complex  [4].

Ant Colony Optimization (ACO)  [5] algorithms are a promising candidate for meta-scheduling in lambda grids. They are inspired on the observation of the foraging behavior of natural ants, being specially suited for hard-to-solve combinatorial problems or situations where a distributed control is needed. Indeed, in a lambda grid environment, requests are made for lightpath connectivity from an application to a computing resource to be discovered in the grid. In ACO-based algorithms, the discovery of computing resources in the grid is a by-product of the discovery of good routes by the artificial ants. Thus, the ants can gather both resource availability and routing state information in their trips throughout the lambda grid system. In other words, the ants allow for grid and networking resource integration at the control plane of the network. Hence this information can be used in meta-scheduling and co-allocation of the lambda grid resources. In fact, the effectiveness of ACO-based algorithms has already been demonstrated for immediate reservations  [6], but their support for advance reservations remained an open research issue.

In this context, the contributions of this paper are three-fold. We present an ACO-based framework for distributed meta-scheduling in lambda grids with support to distributed advance reservation and co-allocation of both computing and optical networking resources. We also present an aggregation mechanism for the information collected by the ants to keep their overhead in the lambda grid system low. In addition, we detail the use of an extended RSVP-TE signaling protocol  [7], which has already been used for distributed reservation of resources on optical networks, to also reserve other grid resources and to support advance reservations.

Simulations are carried out to evaluate the performance of the ACO algorithm under different local and meta-scheduling policies, and different resource co-allocation algorithms. Moreover, a comparison with the immediate reservation case is provided to show the importance of supporting advance reservations in order to improve the performance of the scheduling process.

The remaining of the paper is organized as follows. Firstly, we briefly introduce the motivation of this paper and discuss some related works in: (i) ant algorithms for grid meta-scheduling, and (ii) optical network reservation in advance and co-allocation of processing and networking resources for grid environments. In Section  3, we discuss the advance reservation model and, in Section  4, the meta-scheduling architecture used throughout this work. Then, in Section  5, we present our ACO framework for distributed meta-scheduling in advance with co-allocation of computing and optical networking resources. In Section  6, we detail the simulations carried out to evaluate our proposed approach for meta-scheduling in lambda grids. The results obtained through simulations are shown and discussed in Section  7. Finally, in Section  8, conclusions are drawn.

Section snippets

Motivation and related work

To the best of our knowledge, there is no other work in the literature with explicit advance reservation and resource co-allocation using ACO-based algorithms for grid meta-scheduling. Besides, all proposed mechanisms in this work are distributed: meta-scheduling, advance reservation and co-allocation of resources.

A complete solution for an advance reservation and resource co-allocation mechanism will need to address the challenges related to the control plane protocols, albeit they are often

Advance reservation model

Advance reservation can fall into different types, according to its specifications  [2], [31], [32]. For instance, if the reservation specifies a starting time and a duration, it is called STSD (Specified starting Time, Specified Duration)  [31] or fixed  [32]. A variation of this type is STSD with flexible window  [31] or first-fit/deadline  [32], where a range of starting times is defined instead of a single starting time.

In this work, we consider STSD requests with flexible window, which

Meta-scheduling architecture

Meta-schedulers can be classified into three different models  [3]. In the centralized model, the meta-scheduler is a central instance that has a complete knowledge of the usage of the grid resources. Indeed, in this case, the meta-scheduler has a full control over each local scheduler of the grid. The hierarchical model is a variation of the centralized scheme, where the meta-scheduler is a central instance that communicates with other schedulers of its hierarchy. In distributed

Ant Colony Optimization (ACO)

ACO algorithms are based on artificial stigmergy  [35], where artificial pheromone levels have positive or negative feedback according to the solution quality seen by the ants. Since those levels contain information from previous solutions of the problem, they can be explored collectively by the ants to improve the solution. Although the artificial ant is a simple, lightweight mobile agent  [36], [37], stigmergy allows for the ant colony to exhibit an emergent, self-organizing behavior  [38],

Simulation

For evaluating the proposed algorithms, we used the NSFNet backbone network that is shown in Fig. 5, where the latencies between neighbor nodes are depicted at each link. It is a 14-node network with 21 bidirectional links and it is well-balanced  [20], with average shortest-path length between all pairs of nodes equal to 2.2 hops and diameter equal to 3. The NSFNet network is a very common benchmark for assessing routing performance  [1], [6], [20], [27], [28], [30], being a conservative

Numerical results

We considered the following notation on the next figures, which takes the form: meta-scheduling policy (CA, LL or BADR)/local scheduling approach for the processing resources (EST or FF)—local scheduling approach for the networking resources (EST or FF). As already explained for the SF algorithm, since the EST and FF local scheduling approaches are equivalent for the networking resources, it is omitted for sake of clarity.

First of all, we evaluate the influence of the fixed timeslot duration (T

Conclusions

In this work, we presented an ACO-based distributed meta-scheduler with distributed resource co-allocation and advance reservation support in lambda grids. We proposed three different resource co-allocation algorithms: Server First, Server First-Relaxed and Network First. We demonstrated that the best strategy is to allocate first the networking resources and then allocate the processing resources, i.e., the Network First algorithm. However, the RSVP-TE signaling protocol has to be extended to

Acknowledgments

This work was supported by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) (n2008/57857-2), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) (n574017/2008-9), and Instituto Nacional de Ciência e Tecnologia Fotônica para Comunicações Ópticas (FOTONICOM).

Gustavo Sousa Pavani graduated from University of Campinas (UNICAMP) in 2001 with a degree in Computer Engineering. He received his M.Sc. degree and his Ph.D. degree in Electrical Engineering from UNICAMP, in 2003 and 2006, respectively. Currently, he is an associate professor at Universidade Federal do ABC (UFABC), Brazil. He has interest on the following topics: routing algorithms for packet-switched and circuit-switched optical networks by means ant-colony optimization (ACO), GMPLS control

References (44)

  • S. Sotiriadis, N. Bessis, F. Xhafa, N. Antonopoulos, From meta-computing to interoperable infrastructures: A review of...
  • Q. Snell, M. Clement, D. Jackson, C. Gregory, The performance impact of advance reservation meta-scheduling, in:...
  • M. Dorigo et al.

    Ant Colony Optimization

    (2004)
  • L. Berger, Generalized multi-protocol label switching (GMPLS) signaling resource ReserVation protocol-traffic...
  • H. Yan, X.-Q. Shen, X. Li, M.-H. Wu, An improved ant algorithm for job scheduling in grid computing, in: Fourth...
  • M. Bandieramonte, A. Di Stefano, G. Morana, An ACO inspired strategy to improve jobs scheduling in a grid environment,...
  • K.R. Ku-Mahamud, H.J.A. Nasir, Ant colony algorithm for job scheduling in grid computing, in: Fourth Asia International...
  • W.-N. Chen et al.

    An ant colony optimization approach to a grid workflow scheduling problem with various QoS requirements

    IEEE Trans. Syst. Man Cybern. Part C Appl. Rev.

    (2009)
  • T. Kokilavani, D.G. Amalarethinam, An ant colony optimization based load sharing technique for meta task scheduling in...
  • H. Wenming, D. Zhenrong, W. Peizhi, Trust-based ant colony optimization for grid resource scheduling, in: Third...
  • A. Kant, A. Sharma, S. Agarwal, S. Chandra, An ACO approach to job scheduling in grid environment, in: First...
  • G.S. Pavani, H. Waldman, Grid resource management by means of ant colony optimization, in: Third International...
  • Cited by (8)

    • An improved SSO algorithm for cyber-enabled tumor risk analysis based on gene selection

      2019, Future Generation Computer Systems
      Citation Excerpt :

      SSO evolved from particle swarm optimization a number of years ago, which is customized for use for discrete problems. Compared with particle swarm optimization (PSO) [3,8,25–27], genetic algorithm (GA) [11,28–30] and ant colony optimization (ACO) [4,5,31–33], SSO has gained increasing attention from researchers for its simplicity, efficiency and better convergence. Nevertheless, SSO still suffers from some unavoidable disadvantages in a few respects, as do other heuristic algorithms.

    • Smart perception and autonomic optimization: A novel bio-inspired hybrid routing protocol for MANETs

      2018, Future Generation Computer Systems
      Citation Excerpt :

      Further, bio-inspired methods have also shown the great potential for solving path-finding problems in MANETs. For example, ant colony optimization (ACO), inspired by ants’ foraging processes, is able to achieve an optimal solution by employing positive feedback mechanism [18]. Some routing protocols are proposed based on ACO, e.g., ARA [19], AntHocNet [20], HOPNET, AD-ZRP [21] and HACOR [22].

    • Survivability in Lambda Grids by means of Ant Colony Optimization

      2021, Proceedings of the IM 2021 - 2021 IFIP/IEEE International Symposium on Integrated Network Management
    • Distributed resource scheduling algorithm based on hybrid genetic algorithm

      2017, Proceedings - 2017 International Conference on Computing Intelligence and Information System, CIIS 2017
    View all citing articles on Scopus

    Gustavo Sousa Pavani graduated from University of Campinas (UNICAMP) in 2001 with a degree in Computer Engineering. He received his M.Sc. degree and his Ph.D. degree in Electrical Engineering from UNICAMP, in 2003 and 2006, respectively. Currently, he is an associate professor at Universidade Federal do ABC (UFABC), Brazil. He has interest on the following topics: routing algorithms for packet-switched and circuit-switched optical networks by means ant-colony optimization (ACO), GMPLS control plane, and the optical network support for grid and cloud architectures.

    Rodrigo Izidoro Tinini graduated from Universidade Municipal de São Caetano do Sul (USCS) in 2011 with a degree in Computer Science. He received his M.Sc. degree in Computer Science from Federal University of ABC (UFABC) in 2014. Currently, he is a Ph.D. student at University of São Paulo (USP) with a scholarship awarded from Hewlett-Packard. He has interest on the following topics: grid computing, optical networking and artificial intelligence.

    View full text