QoS and preemption aware scheduling in federated and virtualized Grid computing environments
Highlights
► We consider a federation of Grids where external requests have different QoS requirements. ► We propose a workload allocation policy and a dispatch policy. ► We examine the number of VM preemptions that take place. ► Proposed workload allocation policy significantly decreases the number of VM preemptions. ► Proposed dispatch policy reduces the likelihood of pre-empting external requests with higher QoS requirements.
Introduction
Resource provisioning for user applications is one of the main challenges and research areas in federated Grid environments. Federated Grids, such as InterGrid, enable sharing, selection, and aggregation of resources across several Grids, which are connected through high bandwidth network connections. Nowadays, heavy computational requirements, mostly from scientific communities, are supplied by these federated environments such as PlanetLab [8]. Job abstraction is widely used in resource management of Grid environments. However, due to advantages of Virtual Machine (VM) technology, recently, many resource management systems have emerged to enable another style of resource management based on lease abstraction [38].
InterGrid, as a federated Grid environment, also aims to provide a software system that interconnects islands of virtualized Grids. It provides resources in the form of VMs and allows users to create execution environments for their applications on the VMs [12]. In each constituent Grid, the provisioning rights over several clusters inside the Grid are delegated to the InterGrid Gateway (IGG). IGGs coordinate resource allocation for requests coming from other Grids (external users) through predefined contracts between Grids [11]. On the other hand, local users in each cluster send their requests directly to the local resource manager (LRM) of the cluster.
Hence, resource provisioning is done for two different types of users, namely: local users and external users. As illustrated in Fig. 1, local users (hereafter termed as local requests), refer to users who ask their local cluster resource manager (LRM) for resources. External users (hereafter termed as external requests) are those users who send their requests to a gateway (IGG) to get access to a larger amount of shared resources. Typically, local requests have priority over external requests in each cluster [6]. In other words, the organization that owns the resources would like to ensure that its community has priority access to the resources. Under such a circumstance, external requests are welcome to use resources if they are available. Nonetheless, external requests should not delay the execution of local requests.
In our previous research [33], we demonstrated how preemption of external requests in favor of local requests can help serving more local requests. However, the side-effects of preemption are twofold:
- •
From the system owner perspective, preempting VMs imposes a notable overhead to the underlying system and degrades resource utilization [38].
- •
From the external user perspective, preemption increases the response time of the external requests.
The problem gets complicated further when external requests have different levels of Quality of Service (QoS) requirements (also termed different request types in this paper). For instance, some external requests can have deadlines whereas others do not. Preemption affects the QoS constraints of such requests. This implies that some external requests are more valuable than others and, therefore, more precedence should be given to valuable requests by reducing the chance of preemption of these requests.
To address these problems, in this paper, we propose a QoS and preemption-aware scheduling policy for a virtualized Grid which contributes resources to a federated Grid. This scheduling policy comprises of two parts.
The first part, called workload allocation policy, determines the fraction of external requests that should be allocated to each cluster in a way that minimizes the number of VM preemptions. The proposed policy is based on the stochastic analysis of routing in parallel, non-observable queues. Moreover, this policy is knowledge-free (i.e. it is not dependent on the availability information of the clusters). Thus, this policy does not impose any overhead on the system. However, it does not decide the cluster that each single external request should be dispatched upon arrival. In other words, dispatching of the external requests to clusters is random.
Therefore, in the second part, called dispatch policy, we propose a policy to find out the cluster to which each request should be allocated to. The dispatch policy has the awareness of request types and aims to minimize the likelihood of preempting valuable requests. This is performed by working out a deterministic sequence for dispatching external requests. In summary, our paper makes the following contributions:
- •
Providing an analytical queuing model for a Grid, based on the routing in parallel non-observable queues.
- •
Adapting the proposed analytical model to a preemption-aware workload allocation policy.
- •
Proposing a deterministic dispatch policy to give more priority to more valuable users and meet their QoS requirements.
- •
Evaluating the proposed policies under realistic workload models and considering performance metrics such as number of VM preemptions, utilization, and average weighted response time.
Section snippets
InterGrid environment
In this section, we provide a brief overview on InterGrid architecture and implementation. Interested readers could refer to [12] for more details.
Analytical queuing model
In this section, we describe the analytical modeling of preemption in a virtualized Grid environment based on routing in parallel queues. This section is followed by our proposed scheduling policy in IGG built upon the analytical model provided in this part.
The queuing model that represents a gateway along with several non-dedicated clusters (i.e. clusters with shared resources between local and external requests) is depicted in Fig. 3. There are clusters where cluster receives requests
QoS and preemption-aware scheduling
In this section, we propose a workload allocation policy and a dispatch policy. The positioning of this scheduling policy in IGG is demonstrated in Fig. 2. The proposed scheduling policy comprises of two parts. The first part, discusses how the analysis mentioned in the previous section can be adapted as the workload allocation policy for external requests in IGG. The second part, is a dispatch policy which determines the sequence of dispatching external requests to different clusters
Performance evaluation
In this section, we discuss different performance metrics considered, the scenario in which the experiments are carried out; finally, experimental results obtained from the simulations are discussed.
Related work
There are several research works that have investigated “preemption” of jobs/requests in parallel distributed computing. Scheduling a mixture of different job/request types has also been extensively studied. Particularly, the mixture of local and external requests have been investigated [24], [14], [4], [3]. Meta-scheduling has also been under through investigation in multi-cluster/Grid computing environments. In this section, we provide a review on the recent studies in these areas and
Conclusions and future work
In this research we explored how we can minimize the side-effects of VM preemptions in a federation of virtualized Grids such as InterGrid. We consider circumstances that local requests in each cluster of a Grid coexist with external requests. Particularly, we consider situations that external requests have different levels of QoS (i.e. some external requests are more important than others). For this purpose, we proposed a preemption-aware workload allocation policy (PAP) in IGG to distribute
Mohsen Amini Salehi is a Ph.D. student under the supervision of professor Rajkumar Buyya in CLOUDS lab, Melbourne University, Australia. He was a university lecturer in Azad University of Mashhad, Iran in 2006–2008. He received his M.Sc. from Ferdowsi University of Mashhad and B.Sc. from Azad University of Mashhad in Software Engineering in 2006 and 2003, respectively. His thesis for his M.Sc. was on load balancing in Grid computing. Currently, he is involved in the InterGrid project and he
References (43)
- et al.
A progressive multi-layer resource reconfiguration framework for time-shared Grid systems
Future Generation Computer Systems
(2009) Optimal load distribution in nondedicated heterogeneous cluster and Grid computing environments
Journal of Systems Architecture
(2008)- et al.
A general distributed scalable Grid scheduler for independent tasks
Journal of Parallel and Distributed Computing
(2009) - et al.
Novel critical-path based low-energy scheduling algorithms for heterogeneous multiprocessor real-time embedded systems
- et al.
The power of preemption in economic online markets
- J. Anselmi, B. Gaujal, Optimal routing in parallel, non-observable queues and the price of anarchy revisited, in: 22nd...
- et al.
Dynamic routing of customers with general delay costs in a multiserver queuing system
Probability in the Engineering and Informational Sciences
(2009) - et al.
Adaptive optimal load balancing in a nonhomogeneous multiserver system with a central job scheduler
IEEE Transactions on Computers
(1990) - et al.
Priorities among multiple queues for processor co-allocation in multicluster systems
- J.S. Chase, D.E. Irwin, L.E. Grit, J.D. Moore, S.E. Sprenkle, Dynamic virtual clusters in a Grid site manager, in:...
PlanetLab: an overlay testbed for broad-coverage services
ACM SIGCOMM Computer Communication Review
Performance analysis of multiple site resource provisioning: effects of the precision of availability information
InterGrid: a case for internetworking islands of Grids
Concurrency and Computation: Practice and Experience
Harnessing cloud technologies for a virtualized distributed computing infrastructure
IEEE Internet Computing
Performance modeling and prediction of nondedicated network computing
IEEE Transactions on Computers
Prospects of collaboration between compute providers by means of job interchange
Dynamic scheduling of parallel jobs with QoS demands in multiclusters and Grids
Allocating non-real-time and soft real-time jobs in multiclusters
IEEE Transactions on Parallel and Distributed Systems
Periodic routing to parallel queues and Billiard sequences
Mathematical Methods of Operations Research
Cited by (16)
A survey on cloud-based video streaming services
2021, Advances in ComputersCitation Excerpt :Mechanisms and policies are required to dynamically coordinate load distribution between the geographically distributed data centers and determine the optimal datacenter to provide streaming service for each video (e.g., for storage, processing, or delivery). To address this problem, Buyya et al. [145] advocate the idea of creating the federation of cloud environments. In the context of video streaming, a cost-efficient and low latency streaming can be achieved by federating edge datacenters and take advantage of cached contents or processing power of neighboring edge datacenters.
Service level agreement based adaptive Grid superscheduling
2016, Future Generation Computer SystemsCitation Excerpt :In high demand federated Grids, the resource provisioning is performed via Virtual Machines (VMs) preemption. Therefore, in [10] a set of algorithms are proposed to decrease the number of VM preemptions in the emulated Grid environment. The Prediction-aware workload Allocation Policy (PAP) determines the heavily loaded resource clusters and eliminates them from the set of available resources by decreasing their routing probabilities.
Taxonomy of contention management in interconnected distributed systems
2022, Computing Handbook: Two-Volume SetResearch on Task Scheduling of Flexible Regulation and Control Platform Based on Evolutionary Game
2019, Jisuanji Gongcheng/Computer EngineeringA QoS ranking and controlling framework in virtualised clouds
2019, International Journal of Networking and Virtual Organisations
Mohsen Amini Salehi is a Ph.D. student under the supervision of professor Rajkumar Buyya in CLOUDS lab, Melbourne University, Australia. He was a university lecturer in Azad University of Mashhad, Iran in 2006–2008. He received his M.Sc. from Ferdowsi University of Mashhad and B.Sc. from Azad University of Mashhad in Software Engineering in 2006 and 2003, respectively. His thesis for his M.Sc. was on load balancing in Grid computing. Currently, he is involved in the InterGrid project and he works on preemption-aware scheduling methods in virtualized resource providers.
Bahman Javadi is a Research Fellow at the University of Melbourne, Australia. He was a postdoctoral fellow in the MESCAL team at INRIA Rhone-Alpes, France in 2008–2010. He received his MS and Ph.D. in Computer Engineering from Amirkabir University of Technology in 2001 and 2007, respectively. He has been working as a research scholar in the School of Engineering and Information Technology, Deakin University, Australia from 2005–2006. He is co-founder of the Failure Trace Archive, which serves as a public repository of failure traces and algorithms for distributed systems. He served as a program committee of many international conferences and workshops and co-guest editor of a special issue of the Journal of Future Generation Computer Systems on Desktop Grids. His research interests include Cloud and Grid computing, performance evaluation of large scale distributed computing systems, and reliability and fault tolerance.
Dr. Rajkumar Buyya is Professor of Computer Science and Software Engineering; and Director of the Cloud Computing and Distributed Systems (CLOUDS) Laboratory at the University of Melbourne, Australia. He is also serving as the founding CEO of Manjrasoft Pty Ltd., a spin-off company of the University, commercializing its innovations in Grid and Cloud Computing. He has authored and published over 300 research papers and four text books. The books on emerging topics that Dr. Buyya edited include, High Performance Cluster Computing (Prentice Hall, USA, 1999), Content Delivery Networks (Springer, Germany, 2008), Market-Oriented Grid and Utility Computing (Wiley, USA, 2009), and Cloud Computing: Principles and Paradigms (Wiley, USA, 2011). He is one of the highly cited authors in computer science and software engineering worldwide (h-index = 52, g-index = 111, 14 500 citations).
Software technologies for Grid and Cloud computing developed under Dr. Buyya’s leadership have gained rapid acceptance and are in use at several academic institutions and commercial enterprises in 40 countries around the world. Dr. Buyya has led the establishment and development of key community activities, including serving as foundation Chair of the IEEE Technical Committee on Scalable Computing and four IEEE conferences (CCGrid, Cluster, Grid, and e-Science). He has presented over 250 invited talks on his vision on IT Futures and advanced computing technologies at international conferences and institutions in Asia, Australia, Europe, North America, and South America. These contributions and international research leadership of Dr. Buyya are recognized through the award of “2009 IEEE Medal for Excellence in Scalable Computing” from the IEEE Computer Society, USA. Manjrasoft’s Aneka technology for Cloud Computing developed under his leadership has received “2010 Asia Pacific Frost and Sullivan New Product Innovation Award”.