A case for cooperative and incentive-based federation of distributed clusters

doi:10.1016/j.future.2007.05.006

Future Generation Computer Systems

Volume 24, Issue 4, April 2008, Pages 280-295

https://doi.org/10.1016/j.future.2007.05.006 Get rights and content

Abstract

Research interest in Grid computing has grown significantly over the past five years. Management of distributed resources is one of the key issues in Grid computing. Central to management of resources is the effectiveness of resource allocation as it determines the overall utility of the system. The current approaches to brokering in a Grid environment are non-coordinated since application-level schedulers or brokers make scheduling decisions independently of the others in the system. Clearly, this can exacerbate the load sharing and utilization problems of distributed resources due to sub-optimal schedules that are likely to occur. To overcome these limitations, we propose a mechanism for coordinated sharing of distributed clusters based on computational economy. The resulting environment, called Grid-Federation, allows the transparent use of resources from the federation when local resources are insufficient to meet its users’ requirements. The use of computational economy methodology in coordinating resource allocation not only facilitates the Quality of Service (QoS)-based scheduling, but also enhances utility delivered by resources. We show by simulation, while some users that are local to popular resources can experience higher cost and/or longer delays, the overall users’ QoS demands across the federation are better met. Also, the federation’s average case message-passing complexity is seen to be scalable, though some jobs in the system may lead to large numbers of messages before being scheduled.

Introduction

Clusters of computers have emerged as mainstream parallel and distributed platforms for high-performance, high-throughput and high-availability computing. Grid [19] computing extends the cluster computing idea to wide-area networks. A grid consists of cluster resources that are usually distributed over multiple administrative domains, managed and owned by different organizations having different resource management policies. With the large scale growth of networks and their connectivity, it is possible to couple these cluster resources as a part of one large Grid system. Such large scale resource coupling and application management is a complex undertaking, as it introduces a number of challenges in the domain of security, resource/policy heterogeneity, resource discovery, fault tolerance, dynamic resource availability and underlying network conditions.

The resources on a Grid (e.g. clusters, supercomputers) are managed by local resource management systems (LRMSes) such as Condor [28] and PBS [7]. These resources can also be loosely coupled to form campus grids using multi-clustering systems such as SGE [22] and LSF [40] that allow sharing of clusters owned by the same organization. In other words, these systems do not allow their combination similar to autonomous systems, to create an environment for cooperative federation of clusters, which we refer as Grid-Federation.

Other related concept called Virtual Organization (VO) [19]-based Grid resource sharing has been proposed in the literature. Effectively, a VO is formed to solve specific scientific problems. All the participants follow the same resource management policies defined by a VO. Hence, a VO represents a socialist world, wherein the participants have to adhere to community-wide agreed policies and priorities. In contrast, proposed Grid-Federation is a democratic world with complete autonomy for each participant. Further, a participant in the federation can behave rationally as we propose the use of economic model for resource management. Grid-Federation users submit their job to the local scheduler. In case local resources are not available or are not able to meet the requirement then job is transparently migrated to a remote resource (site) in the federation, although this job migration is driven by the users’ QoS requirements. In a VO, user jobs are managed by a global scheduler which enforces resource allocation based on VO-wide policies.

Scheduling jobs across resources that belong to distinct administrative domains is referred to as superscheduling. The majority of existing approaches to superscheduling [32] in a Grid environment is non-coordinated. Superschedulers or resource brokers such as Nimrod-G [1], Tycoon [27], and Condor-G [21] perform scheduling related activities independent of the other superschedulers in the system. They directly submit their applications to the underlying resources without taking into account the current load, priorities, utilization scenarios of other application-level schedulers. Clearly, this can lead to over-utilization or a bottleneck on some valuable resources while leaving others largely underutilized. Furthermore, these superschedulers do not have a coordination mechanism and this exacerbates the load sharing and utilization problems of distributed resources because sub-optimal schedules are likely to occur.

Furthermore, end-users or their application-level superschedulers submit jobs to the LRMS without having knowledge about response time or service utility. Sometimes these jobs are queued for relatively excessive times before being actually processed, leading to degraded QoS. To mitigate such long processing delays and to enhance the value of computation, a scheduling strategy can use priorities from competing user jobs that indicate varying levels of importance. This is a widely studied scheduling technique (e.g. using priority queues) [3]. To be effective, the schedulers require knowledge of how users value their computations in terms of QoS requirements, which usually varies from job to job. LRMS schedulers can provide a feedback signal that prevents the user from submitting unbounded amounts of work.

Currently, system-centric approaches such as Legion [13], [38], NASA-Superscheduler [33], Condor, Condor-Flock [8], Apples [6], PBS and SGE provide limited support for QoS driven resource sharing. These system-centric schedulers, allocate resources based on parameters that enhance system utilization or throughput. The scheduler either focuses on minimizing the response time (sum of queue time and actual execution time) or maximizing overall resource utilization of the system and these are not specifically applied on a per-user basis (user oblivious). System-centric schedulers treat all resources with the same scale, as if they are worth the same and the results of different applications have the same value; while in reality the resource provider may value his resources differently and has a different objective function. Similarly, a resource consumer may value various resources differently and may want to negotiate a particular price for using a resource. Hence, resource consumers are unable to express their valuation of resources and QoS parameters. Furthermore, the system-centric schedulers do not provide any mechanism for resource owners to define what is shared, who is given the access and the conditions under which sharing occurs [20].

To overcome these shortcomings of non-coordinated, system-centric scheduling systems, we propose a new distributed resource management model, called Grid-Federation. Our Grid-Federation system is defined as a large scale resource sharing system that consists of a coordinated federation (the term is also used in the Legion system and should not be confused with our definition), of distributed clusters based on policies defined by their owners (shown in Fig. 1). Fig. 1 shows an abstract model of our Grid-Federation over a shared federation directory. To enable policy-based transparent resource sharing between these clusters, we define and model a new RMS system, which we call Grid-Federation Agent (GFA). Currently, we assume that the directory information is shared using some efficient protocol (e.g. a peer-to-peer protocol [29], [25]). In this case the P2P system provides a decentralized database with efficient updates and range query capabilities. Individual GFAs access the directory information using the interfaces shown in Fig. 1, i.e. subscribe, quote, unsubscribe, query. In this paper, we are not concerned with the specifics of the interface (which can be found in [30]) although we do consider the implications of the required message-passing, i.e. the messages sent between GFAs to undertake the scheduling work.

Our approach considers the emerging computational economy metaphor [1], [36], [37] for Grid-Federation. In this case resource owners: can clearly define what is shared in the Grid-Federation while maintaining complete autonomy; can dictate who is given access; and receive incentives for leasing their resources to federation users. We adopt the market-based economic model from [1] for resource allocation in our proposed framework. Some of the commonly used economic models [9] in resource allocation include the commodity market model, the posted price model, the bargaining model, the tendering/contract-net model, the auction model, the bid-based proportional resource sharing model, the community/coalition model and the monopoly model. We focus on the commodity market model [39]. In this model every resource has a price, which is based on the demand, supply and value in the Grid-Federation. Our Economy model driven resource allocation methodology focuses on: (i) optimizing resource provider’s objective functions, (ii) increasing end-user’s perceived QoS value based on QoS level indicators [30] and QoS constraints.

The key contribution of the paper includes our proposed new distributed resource management model, called Grid-Federation, which provides: (i) a market-based Grid superscheduling technique; (ii) decentralization via a shared federation directory that gives site autonomy and scalability; (iii) ability to provide admission control facility at each site in the federation; (iv) incentives for resources owners to share their resources as part of the federation; and (v) access to a larger pool of resources for all users. In this paper, we demonstrate, by simulation, the feasibility and effectiveness of our proposed Grid-Federation.

The rest of the paper is organized as follows. Section 2 explores various related projects. In Section 3 we summarize our Grid-Federation and Section 4 deals with various experiments that we conducted to demonstrate the utility of our work. We end the paper with some concluding remarks and future work in Section 5.

Section snippets

Related work

Resource management and scheduling for parallel and distributed systems has been investigated extensively in the recent past (AppLes, NetSolve [12], Condor, LSF, SGE, Legion, Condor-Flock, NASA-Superscheduler, Nimrod-G and Condor-G). In this paper, we mainly focus on superscheduling systems that allow scheduling of jobs across wide-area distributed clusters. We highlight the current scheduling methodology followed by Grid superscheduling systems including NASA-Superscheduler,

Grid-Federation: Architecture for decentralized resource management

(1) Grid-Federation agent: We define our Grid-Federation (shown in Fig. 1) as a mechanism that enables logical coupling of cluster resources. The Grid-Federation supports policy-based [14] transparent sharing of resources and QoS-based [26] job scheduling. We also propose a new computational economy metaphor for cooperative federation of clusters. Computational economy [1], [36], [37] enables the regulation of supply and demand of resources, offers incentive to the resource owners for leasing,

Workload and resource methodology

We used trace-based simulation to evaluate the effectiveness of the proposed system and the QoS provided by the proposed superscheduling algorithm. The workload trace data was obtained from [18]. The trace contains the real time workload of various supercomputers/resources that are deployed at the Cornell Theory Center (CTC SP2), Swedish Royal Institute of Technology (KTH SP2), Los Alamos National Lab (LANL CM5), LANL Origin 2000 Cluster (Nirvana) (LANL Origin), NASA Ames (NASA iPSC) and

Conclusion

We proposed a new computational economy-based distributed cluster resource management system called Grid-Federation. The federation uses agents that maintain and access a shared federation directory of resource information. A cost–time scheduling algorithm was applied to simulate the scheduling of jobs using iterative queries to the federation directory. Our results show that, while the users from popular (fast/cheap) resources have increased competition and therefore a harder time to satisfy

Acknowledgments

We thank our group members at the University of Melbourne–Marcos Assuncao, Al-Mukaddim Khan Pathan, Md Mustafizur Rahman–for their constructive comments on this paper. We shall also like to thank anonymous reviewers for their comments that have immensely helped us to improve the quality of work. This work is supported by the Australian Research Council discovery project grant.

References (40)

D. Abramson et al.
A computational economy for grid computing and its implementation in the Nimrod-G resource broker
Future Generation Computer Systems
(2002)
B. Alexander et al.
Gridbank: A grid accounting services architecture for distributed systems sharing and integration
A.O. Allen
Probability, Statistics and Queuing Theory with Computer Science Applications
(1978)
N. Andrade et al.
OurGrid: An approach to easily assemble grids with equitable resource sharing
A. Auyoung, B. Chun, A. Snoeren, A. Vahdat, Resource allocation in federated distributed computing infrastructures, in:...
F. Berman, R. Wolski, The apples project: A status report, in: Proceedings of the 8th NEC Research Symposium, Berlin,...
B. Bode et al.
PBS: The portable batch scheduler and the maui scheduler on linux clusters
A. Raza Butt et al.
A self-organizng flock of condors
R. Buyya et al.
Economic models for resource management and scheduling in grid computing
Concurrency and Computation: Practice and Experience
(2002)
R. Buyya et al.
Gridsim: A toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing
Concurrency and Computation: Practice and Experience
(2002)

M. Cai, M. Frank, J. Chen, P. Szekely, Maan: A multi-atribute addressable network for grid information services, in:...

H. Casanova et al.

Netsolve: A network server solving computational science problem

International Journal of Supercomputing Applications and High Performance Computing

(1997)

S. Chapin et al.

The legion resource management system

J. Chase, L. Grit, D. Irwin, J. Moore, S. Sprenkle, Dynamic virtual clusters in a grid site manager, in: The Twelfth...

J.Q. Cheng et al.

The WALRAS algorithm: A convergent distributed implementation of general equilibrium outcomes

Computational Economics

(1998)

B. Chun, D. Culler, A decentralized, secure remote execution environment for clusters, in: Proceedings of the 4th...

J. Feigenbaum et al.

Distributed algorithmic mechanism design: Recent results and future directions

D.G. Feitelson et al.

Workload sanitation for performance evaluation

I. Foster et al.

The Grid: Blueprint for a New Computing Infrastructure

(1998)

I. Foster et al.

The anatomy of the grid: Enabling scalable virtual organizations

International Journal of Supercomputer Applications

(2001)

Cited by (25)

Generic-distributed framework for cloud services marketplace based on unified ontology
2017, Journal of Advanced Research
Citation Excerpt :
They added: “In this cloud marketplace, cloud service providers and consumers, trading cloud services as utilities”. Techniques used for web service discovery and selection [5] cannot be adopted for cloud services because of their special characteristics. This work presents a generic framework that serves as a template for cloud service marketplace.
Cloud computing is a pattern for delivering ubiquitous and on demand computing resources based on pay-as-you-use financial model. Typically, cloud providers advertise cloud service descriptions in various formats on the Internet. On the other hand, cloud consumers use available search engines (Google and Yahoo) to explore cloud service descriptions and find the adequate service. Unfortunately, general purpose search engines are not designed to provide a small and complete set of results, which makes the process a big challenge. This paper presents a generic-distrusted framework for cloud services marketplace to automate cloud services discovery and selection process, and remove the barriers between service providers and consumers. Additionally, this work implements two instances of generic framework by adopting two different matching algorithms; namely dominant and recessive attributes algorithm borrowed from gene science and semantic similarity algorithm based on unified cloud service ontology. Finally, this paper presents unified cloud services ontology and models the real-life cloud services according to the proposed ontology. To the best of the authors’ knowledge, this is the first attempt to build a cloud services marketplace where cloud providers and cloud consumers can trend cloud services as utilities. In comparison with existing work, semantic approach reduced the execution time by 20% and maintained the same values for all other parameters. On the other hand, dominant and recessive attributes approach reduced the execution time by 57% but showed lower value for recall.
Energy-efficient and multifaceted resource management for profit-driven virtualized data centers
2012, Future Generation Computer Systems
Citation Excerpt :
The authors propose using multiple data centers according to their geographical distribution and power consumption. Besides, Ranjan et al. [28] propose a mechanism which allows transparent use of resources from the federation of several distributed clusters when local resources are insufficient to meet users’ requirements. They demonstrate through simulation that overall users’ QoS demands across the federation are better met.
As long as virtualization has been introduced in data centers, it has been opening new chances for resource management. Nowadays, it is not just used as a tool for consolidating underused nodes and save power; it also allows new solutions to well-known challenges, such as heterogeneity management. Virtualization helps to encapsulate Web-based applications or HPC jobs in virtual machines (VMs) and see them as a single entity which can be managed in an easier and more efficient way.
We propose a new scheduling policy that models and manages a virtualized data center. It focuses on the allocation of VMs in data center nodes according to multiple facets to optimize the provider’s profit. In particular, it considers energy efficiency, virtualization overheads, and SLA violation penalties, and supports the outsourcing to external providers.
The proposed approach is compared to other common scheduling policies, demonstrating that a provider can improve its benefit by 30% and save power while handling other challenges, such as resource outsourcing, in a better and more intuitive way than other typical approaches do.
Overlay network resource allocation using a decentralized market-based approach
2012, Future Generation Computer Systems
We present a decentralized market-based approach to resource allocation in a heterogeneous overlay network. This resource allocation strategy dynamically assigns resources in an overlay network to requests for service based on current system utilization, thus enabling the system to accommodate fluctuating demand for its resources. Our approach is based on a mathematical model of this resource allocation environment that treats the allocation of system resources as a constrained optimization problem. From the solution to the dual of this optimization problem, we derive a simple decentralized algorithm that is extremely efficient. Our results show the near optimality of the proposed approach through extensive simulation of this overlay network environment. The simulation study utilizes components taken from a real-world middleware application environment and clearly demonstrates the practicality of the approach in a realistic setting.
An exchange format for representing dynamic information generated from High Performance Computing applications
2011, Future Generation Computer Systems
Citation Excerpt :
High Performance Computing (HPC) systems such as the ones used in grid computing have been shown to be useful in a variety of domains ranging from solving computation-intensive scientific problems to powerful back-office data processing units used in large enterprise applications (e.g. [1–3]).
High Performance Computing (HPC) systems tend to be complex to debug and analyze due to the large number of processes they involve and the way they communicate with each other to perform specific tasks. Recently, there has been an increase in the number of tools to help software engineers analyze the behavior of HPC applications. These tools provide several features that facilitate the understanding and analysis of the information contained in inter-process communication traces generated from running an HPC application. They, however, use different formats to represent traces, which hinders interoperability and sharing of data. In this paper, we address this by proposing an exchange format called MTF (MPI Trace Format) for representing and exchanging traces generated from HPC applications based on the MPI (Message Passing Interface) standard, which is a de facto standard for inter-process communication for high performance computing systems. The design of MTF is validated against well-known requirements for a standard exchange format, with an objective being to lead the work towards standardizing the way MPI traces are represented in order to allow better synergy among tools. We have also developed an MTF toolkit that supports the generation of MTF traces equipped with a query engine to facilitate the retrieval of data from MTF traces. Finally, we show how MTF can carry a large trace generated using a commercial off the shelf MPI trace analysis tool.
An evaluation of the benefits of fine-grained value-based scheduling on general purpose clusters
2011, Future Generation Computer Systems
General purpose compute clusters are used by a wide range of organizations to deliver the necessary computational power for their processes. In order to manage the shared use of such clusters, scheduling policies are installed to determine if and when the jobs submitted to the cluster are executed. Value-based scheduling policies differ from other policies in that they allow users to communicate the value of their computation to the scheduling mechanism. The design of market mechanisms whereby users are able to bid for resources in a fine-grained manner has proven to be an attractive means to implement such policies. In the clearing phase of the mechanism, supply and demand for resources are matched in pursuit of a value-maximizing job schedule and resource prices are dynamically adjusted to the level of excess demand in the system. Despite their success in simulations and research literature, such fine-grained value-based scheduling policies have been rarely used in practice as they are often considered too fragile, too onerous for end-users to work with, and difficult to implement. A coarse-grained form of value-based scheduling that mitigates the aforementioned disadvantages involves the installation of a priority queuing system with fixed costs per queue. At present, however, it is unclear whether such a coarse-grained policy underperforms in value realization when compared to fine-grained scheduling through auctions, and if so, to what extent. Using workload traces of general purpose clusters we make the comparison and investigate under which conditions efficiency can be gained with the fine-grained policy.
Reputation-based dependable scheduling of workflow applications in Peer-to-Peer Grids
2010, Computer Networks
Grids facilitate creation of wide-area collaborative environment for sharing computing or storage resources and various applications. Inter-connecting distributed Grid sites through peer-to-peer routing and information dissemination structure (also known as Peer-to-Peer Grids) is essential to avoid the problems of scheduling efficiency bottleneck and single point of failure in the centralized or hierarchical scheduling approaches. On the other hand, uncertainty and unreliability are facts in distributed infrastructures such as Peer-to-Peer Grids, which are triggered by multiple factors including scale, dynamism, failures, and incomplete global knowledge.
In this paper, a reputation-based Grid workflow scheduling technique is proposed to counter the effect of inherent unreliability and temporal characteristics of computing resources in large scale, decentralized Peer-to-Peer Grid environments. The proposed approach builds upon structured peer-to-peer indexing and networking techniques to create a scalable wide-area overlay of Grid sites for supporting dependable scheduling of applications. The scheduling algorithm considers reliability of a Grid resource as a statistical property, which is globally computed in the decentralized Grid overlay based on dynamic feedbacks or reputation scores assigned by individual service consumers mediated via Grid resource brokers. The proposed algorithm dynamically adapts to changing resource conditions and offers significant performance gains as compared to traditional approaches in the event of unsuccessful job execution or resource failure. The results evaluated through an extensive trace driven simulation show that our scheduling technique can reduce the makespan up to 50% and successfully isolate the failure-prone resources from the system.

View all citing articles on Scopus

Rajiv Ranjan is a final-year Ph.D. student in the Peer-to-Peer and Grids laboratory at the University of Melbourne. Prior to joining Ph.D., he completed a bachelor’s degree with securing first rank in the Computer Engineering Department (North Gujarat University, Gujarat, India) in 2002. He has worked as a research assistant (honors project) at the Physical Research Laboratory (A Unit of Dept. of Space Govt. of India), Ahmedabad, Gujarat, India. Further he was also a lecturer in the computer engineering department (Gujarat University, Gujarat, India) where he had taught undergraduate computer engineering courses including systems software, parallel computation and advance operating system. His current research interest lies in the algorithmic aspects of resource allocation and resource discovery in decentralised Grid and Peer-to-Peer computing systems. He has served as a reviewer for journals including Future Generation Computer Systems, Journal of Parallel and Distributed Computing, IEEE Internet Computing, and IEEE Transactions on Computer Systems. He has also served as an external reviewer for the conferences including IEEE Peer-to-Peer Computing (P2P’04, P2P’05, P2P’06), IEEE/ACM Grid Computing (Grid’06) and Parallel and Distributed Computing, Applications and Technologies (PDCAT’07).

Dr. Aaron Harwood completed his Ph.D degree at Griffth university on high performamnce interconnection networks in 2002. During that time he worked on several software projects including the development of a VLSI layout package and integrated circuit fabrication virtual laboratory now in use for classroom instruction. He has also worked at Research Institute for Industrial Science and Technology (RIST), South Korea, on computer simulation for a robot traffic light controller system. He then joined the University of Melbourne as a lecturer in the Department of Computer Science and Software Engineering where his research focused on the topological properties and software engineering of peer-to-peer systems for high performance computing. In 2003, he co-founded the Peer-to-Peer Networks and Applications Research Group (www.cs.mu.oz.au/p2p), for which he is now Acting Director. He recently developed one of the first parallel computing platforms for peer-to-peer networks. He is a program committee member for the 6th IEEE/ACM Workshop on Grid Computing.

Dr. Rajkumar Buyya is a Senior Lecturer and the Director of the Grid Computing and Distributed Systems (GRIDS) Laboratory within the Department of Computer Science and Software Engineering at the University of Melbourne, Australia. He received B.E and M.E in Computer Science and Engineering from Mysore and Bangalore Universities in 1992 and 1995 respectively; and Doctor of Philosophy (Ph.D.) in Computer Science and Software Engineering from Monash University, Melbourne, Australia in April 2002. He was awarded Dharma Ratnakara Memorial Trust Gold Medal in 1992 for his academic excellence at the University of Mysore, India. He received Leadership and Service Excellence Awards from the IEEE/ACM International Conference on High Performance Computing in 2000 and 2003. Dr. Buyya has authored/co-authored over 130 publications. He has co-authored three books: Microprocessor x86 Programming, BPB Press, New Delhi, 1995, Mastering C++, Tata McGraw Hill Press, New Delhi, 1997, and Design of PARAS Microkernel. The books on emerging topics that he edited include, High Performance Cluster Computing published by Prentice Hall, USA, 1999; and High Performance Mass Storage and Parallel I/O, IEEE and Wiley Press, USA, 2001. He also edited proceedings of ten international conferences and served as guest editor for major research journals. He is serving as an Associate Editor of Future Generation Computer Systems: The International Journal of Grid Computing: Theory, Methods and Applications, Elsevier Press, The Netherlands. Dr. Buyya served as a speaker in the IEEE Computer Society Chapter Tutorials Program (from 1999–2001) and Founding Co-Chair of the IEEE Task Force on Cluster Computing (TFCC) from 1999–2004, and Interim Co-Chair of the IEEE Technical Committee on Scalable Computing (TCSC) from 2004–Sept 2005, a member of Executive Committee of the IEEE Technical Committee on Parallel Processing (TCPP) from 2004–2005. He is currently serving as Elected Chair of the IEEE Technical Committee on Scalable Computing (TCSC).

^☆: This is an extended version of paper that was published with Cluster’05, Boston, MA.

View full text

A case for cooperative and incentive-based federation of distributed clusters☆