QoS and preemption aware scheduling in federated and virtualized Grid computing environments

https://doi.org/10.1016/j.jpdc.2011.10.008Get rights and content

Abstract

Resource provisioning is one of the challenges in federated Grid environments. In these environments each Grid serves requests from external users along with local users. Recently, this resource provisioning is performed in the form of Virtual Machines (VMs). The problem arises when there are insufficient resources for local users to be served. The problem gets complicated further when external requests have different QoS requirements. Serving local users could be solved by preempting VMs from external users which impose overheads on the system. Therefore, the question is how the number of VM preemptions in a Grid can be minimized. Additionally, how we can decrease the likelihood of preemption for requests with more QoS requirements. We propose a scheduling policy in InterGrid, as a federated Grid, which reduces the number of VM preemptions and dispatches external requests in a way that fewer requests with QoS constraints get affected by preemption. Extensive simulation results indicate that the number of VM preemptions is decreased at least by 60%, particularly, for requests with more QoS requirements.

Highlights

► We consider a federation of Grids where external requests have different QoS requirements. ► We propose a workload allocation policy and a dispatch policy. ► We examine the number of VM preemptions that take place. ► Proposed workload allocation policy significantly decreases the number of VM preemptions. ► Proposed dispatch policy reduces the likelihood of pre-empting external requests with higher QoS requirements.

Introduction

Resource provisioning for user applications is one of the main challenges and research areas in federated Grid environments. Federated Grids, such as InterGrid, enable sharing, selection, and aggregation of resources across several Grids, which are connected through high bandwidth network connections. Nowadays, heavy computational requirements, mostly from scientific communities, are supplied by these federated environments such as PlanetLab [8]. Job abstraction is widely used in resource management of Grid environments. However, due to advantages of Virtual Machine (VM) technology, recently, many resource management systems have emerged to enable another style of resource management based on lease abstraction [38].

InterGrid, as a federated Grid environment, also aims to provide a software system that interconnects islands of virtualized Grids. It provides resources in the form of VMs and allows users to create execution environments for their applications on the VMs [12]. In each constituent Grid, the provisioning rights over several clusters inside the Grid are delegated to the InterGrid Gateway (IGG). IGGs coordinate resource allocation for requests coming from other Grids (external users) through predefined contracts between Grids [11]. On the other hand, local users in each cluster send their requests directly to the local resource manager (LRM) of the cluster.

Hence, resource provisioning is done for two different types of users, namely: local users and external users. As illustrated in Fig. 1, local users (hereafter termed as local requests), refer to users who ask their local cluster resource manager (LRM) for resources. External users (hereafter termed as external requests) are those users who send their requests to a gateway (IGG) to get access to a larger amount of shared resources. Typically, local requests have priority over external requests in each cluster [6]. In other words, the organization that owns the resources would like to ensure that its community has priority access to the resources. Under such a circumstance, external requests are welcome to use resources if they are available. Nonetheless, external requests should not delay the execution of local requests.

In our previous research [33], we demonstrated how preemption of external requests in favor of local requests can help serving more local requests. However, the side-effects of preemption are twofold:

  • From the system owner perspective, preempting VMs imposes a notable overhead to the underlying system and degrades resource utilization [38].

  • From the external user perspective, preemption increases the response time of the external requests.

As a result, both the resource owner (who prefers to increase resource utilization) and external users (who are interested in shorter response time) benefit from fewer VM preemptions in the system. We believe that with the extensive current trend in applying VMs in distributed systems, and considering preemption as an outstanding feature of VMs, it is crucial to investigate policies that minimize these side-effects. Therefore, one problem we are dealing with in this research is how to decrease the number of VM preemptions that take place in a virtualized Grid environment.

The problem gets complicated further when external requests have different levels of Quality of Service (QoS) requirements (also termed different request types in this paper). For instance, some external requests can have deadlines whereas others do not. Preemption affects the QoS constraints of such requests. This implies that some external requests are more valuable than others and, therefore, more precedence should be given to valuable requests by reducing the chance of preemption of these requests.

To address these problems, in this paper, we propose a QoS and preemption-aware scheduling policy for a virtualized Grid which contributes resources to a federated Grid. This scheduling policy comprises of two parts.

The first part, called workload allocation policy, determines the fraction of external requests that should be allocated to each cluster in a way that minimizes the number of VM preemptions. The proposed policy is based on the stochastic analysis of routing in parallel, non-observable queues. Moreover, this policy is knowledge-free (i.e. it is not dependent on the availability information of the clusters). Thus, this policy does not impose any overhead on the system. However, it does not decide the cluster that each single external request should be dispatched upon arrival. In other words, dispatching of the external requests to clusters is random.

Therefore, in the second part, called dispatch policy, we propose a policy to find out the cluster to which each request should be allocated to. The dispatch policy has the awareness of request types and aims to minimize the likelihood of preempting valuable requests. This is performed by working out a deterministic sequence for dispatching external requests. In summary, our paper makes the following contributions:

  • Providing an analytical queuing model for a Grid, based on the routing in parallel non-observable queues.

  • Adapting the proposed analytical model to a preemption-aware workload allocation policy.

  • Proposing a deterministic dispatch policy to give more priority to more valuable users and meet their QoS requirements.

  • Evaluating the proposed policies under realistic workload models and considering performance metrics such as number of VM preemptions, utilization, and average weighted response time.

We utilize InterGrid [10], which is a virtualized federated Grid environment, as the context of our work. In the next section, InterGrid structure is discussed in detail. The rest of this paper is organized as follows: In Section 2, an overview of the InterGrid environment is provided. The proposed analytical queuing model is described in Section 3 which is followed by the preemption-aware scheduling policy in Section 4. Performance evaluation of the proposed policy is reported in Section 5. Then, in Section 6 related research works are introduced. Finally, conclusion and future works are provided in Section 7.

Section snippets

InterGrid environment

In this section, we provide a brief overview on InterGrid architecture and implementation. Interested readers could refer to [12] for more details.

Analytical queuing model

In this section, we describe the analytical modeling of preemption in a virtualized Grid environment based on routing in parallel queues. This section is followed by our proposed scheduling policy in IGG built upon the analytical model provided in this part.

The queuing model that represents a gateway along with several non-dedicated clusters (i.e. clusters with shared resources between local and external requests) is depicted in Fig. 3. There are N clusters where cluster j receives requests

QoS and preemption-aware scheduling

In this section, we propose a workload allocation policy and a dispatch policy. The positioning of this scheduling policy in IGG is demonstrated in Fig. 2. The proposed scheduling policy comprises of two parts. The first part, discusses how the analysis mentioned in the previous section can be adapted as the workload allocation policy for external requests in IGG. The second part, is a dispatch policy which determines the sequence of dispatching external requests to different clusters

Performance evaluation

In this section, we discuss different performance metrics considered, the scenario in which the experiments are carried out; finally, experimental results obtained from the simulations are discussed.

Related work

There are several research works that have investigated “preemption” of jobs/requests in parallel distributed computing. Scheduling a mixture of different job/request types has also been extensively studied. Particularly, the mixture of local and external requests have been investigated [24], [14], [4], [3]. Meta-scheduling has also been under through investigation in multi-cluster/Grid computing environments. In this section, we provide a review on the recent studies in these areas and

Conclusions and future work

In this research we explored how we can minimize the side-effects of VM preemptions in a federation of virtualized Grids such as InterGrid. We consider circumstances that local requests in each cluster of a Grid coexist with external requests. Particularly, we consider situations that external requests have different levels of QoS (i.e. some external requests are more important than others). For this purpose, we proposed a preemption-aware workload allocation policy (PAP) in IGG to distribute

Mohsen Amini Salehi is a Ph.D. student under the supervision of professor Rajkumar Buyya in CLOUDS lab, Melbourne University, Australia. He was a university lecturer in Azad University of Mashhad, Iran in 2006–2008. He received his M.Sc. from Ferdowsi University of Mashhad and B.Sc. from Azad University of Mashhad in Software Engineering in 2006 and 2003, respectively. His thesis for his M.Sc. was on load balancing in Grid computing. Currently, he is involved in the InterGrid project and he

References (43)

  • B. Chun et al.

    PlanetLab: an overlay testbed for broad-coverage services

    ACM SIGCOMM Computer Communication Review

    (2003)
  • M. Colajanni, P. Yu, V. Cardellini, Dynamic load balancing in geographically distributed heterogeneous web servers, in:...
  • M.D. De Assunção et al.

    Performance analysis of multiple site resource provisioning: effects of the precision of availability information

  • M. De Assunção et al.

    InterGrid: a case for internetworking islands of Grids

    Concurrency and Computation: Practice and Experience

    (2008)
  • A. di Costanzo et al.

    Harnessing cloud technologies for a virtualized distributed computing infrastructure

    IEEE Internet Computing

    (2009)
  • J. Fontán, T. Vázquez, L. Gonzalez, R.S. Montero, I.M. Llorente, OpenNebula: the open source virtual machine manager...
  • L. Gong et al.

    Performance modeling and prediction of nondedicated network computing

    IEEE Transactions on Computers

    (2002)
  • C. Grimme et al.

    Prospects of collaboration between compute providers by means of job interchange

  • L. He et al.

    Dynamic scheduling of parallel jobs with QoS demands in multiclusters and Grids

  • L. He et al.

    Allocating non-real-time and soft real-time jobs in multiclusters

    IEEE Transactions on Parallel and Distributed Systems

    (2006)
  • A. Hordijk et al.

    Periodic routing to parallel queues and Billiard sequences

    Mathematical Methods of Operations Research

    (2004)
  • Cited by (16)

    • A survey on cloud-based video streaming services

      2021, Advances in Computers
      Citation Excerpt :

      Mechanisms and policies are required to dynamically coordinate load distribution between the geographically distributed data centers and determine the optimal datacenter to provide streaming service for each video (e.g., for storage, processing, or delivery). To address this problem, Buyya et al. [145] advocate the idea of creating the federation of cloud environments. In the context of video streaming, a cost-efficient and low latency streaming can be achieved by federating edge datacenters and take advantage of cached contents or processing power of neighboring edge datacenters.

    • Service level agreement based adaptive Grid superscheduling

      2016, Future Generation Computer Systems
      Citation Excerpt :

      In high demand federated Grids, the resource provisioning is performed via Virtual Machines (VMs) preemption. Therefore, in [10] a set of algorithms are proposed to decrease the number of VM preemptions in the emulated Grid environment. The Prediction-aware workload Allocation Policy (PAP) determines the heavily loaded resource clusters and eliminates them from the set of available resources by decreasing their routing probabilities.

    • A QoS ranking and controlling framework in virtualised clouds

      2019, International Journal of Networking and Virtual Organisations
    View all citing articles on Scopus

    Mohsen Amini Salehi is a Ph.D. student under the supervision of professor Rajkumar Buyya in CLOUDS lab, Melbourne University, Australia. He was a university lecturer in Azad University of Mashhad, Iran in 2006–2008. He received his M.Sc. from Ferdowsi University of Mashhad and B.Sc. from Azad University of Mashhad in Software Engineering in 2006 and 2003, respectively. His thesis for his M.Sc. was on load balancing in Grid computing. Currently, he is involved in the InterGrid project and he works on preemption-aware scheduling methods in virtualized resource providers.

    Bahman Javadi is a Research Fellow at the University of Melbourne, Australia. He was a postdoctoral fellow in the MESCAL team at INRIA Rhone-Alpes, France in 2008–2010. He received his MS and Ph.D. in Computer Engineering from Amirkabir University of Technology in 2001 and 2007, respectively. He has been working as a research scholar in the School of Engineering and Information Technology, Deakin University, Australia from 2005–2006. He is co-founder of the Failure Trace Archive, which serves as a public repository of failure traces and algorithms for distributed systems. He served as a program committee of many international conferences and workshops and co-guest editor of a special issue of the Journal of Future Generation Computer Systems on Desktop Grids. His research interests include Cloud and Grid computing, performance evaluation of large scale distributed computing systems, and reliability and fault tolerance.

    Dr. Rajkumar Buyya is Professor of Computer Science and Software Engineering; and Director of the Cloud Computing and Distributed Systems (CLOUDS) Laboratory at the University of Melbourne, Australia. He is also serving as the founding CEO of Manjrasoft Pty Ltd., a spin-off company of the University, commercializing its innovations in Grid and Cloud Computing. He has authored and published over 300 research papers and four text books. The books on emerging topics that Dr. Buyya edited include, High Performance Cluster Computing (Prentice Hall, USA, 1999), Content Delivery Networks (Springer, Germany, 2008), Market-Oriented Grid and Utility Computing (Wiley, USA, 2009), and Cloud Computing: Principles and Paradigms (Wiley, USA, 2011). He is one of the highly cited authors in computer science and software engineering worldwide (h-index = 52, g-index = 111, 14 500 citations).

    Software technologies for Grid and Cloud computing developed under Dr. Buyya’s leadership have gained rapid acceptance and are in use at several academic institutions and commercial enterprises in 40 countries around the world. Dr. Buyya has led the establishment and development of key community activities, including serving as foundation Chair of the IEEE Technical Committee on Scalable Computing and four IEEE conferences (CCGrid, Cluster, Grid, and e-Science). He has presented over 250 invited talks on his vision on IT Futures and advanced computing technologies at international conferences and institutions in Asia, Australia, Europe, North America, and South America. These contributions and international research leadership of Dr. Buyya are recognized through the award of “2009 IEEE Medal for Excellence in Scalable Computing” from the IEEE Computer Society, USA. Manjrasoft’s Aneka technology for Cloud Computing developed under his leadership has received “2010 Asia Pacific Frost and Sullivan New Product Innovation Award”.

    View full text