Abstract
Cooperative resource sharing enables distinct organizations to form a federation of computing resources. The motivation behind cooperation is that organizations are likely to serve each other by trading unused CPU cycles given the existence of irregular usage patterns of their local resources. In this way, resource sharing would enable organizations to purchase resources at a feasible level while meeting peak computational throughput requirements. This federation results in community grid that must be managed. A functional broker is deployed to facilitate remote resource access within the community grid. A major issue is the problem of correlations in job arrivals caused by seasonal usage and/or coincident resource usage demand patterns. These correlations incur high levels of burstiness in job arrivals causing the job queue of the broker to grow to an extent such that its performance becomes severely impaired. Since job arrivals cannot be controlled, management strategies must be employed to admit jobs in a manner that can sustain a fair level of resource allocation performance at all participating organizations in the community. In this paper, we present a theoretical analysis of the problem of job traffic burstiness on resource allocation performance in order to elicit the general job management strategies to be employed. Based on the analysis, we define and justify a job management strategies for the resource broker to cope with overload conditions caused by job arrival correlations.
References
Altenbernd P, Hansson H (1998) The slack method: A new method for static allocation of hard real-time tasks. Real-Time Systems 15(2):103–130
Andrade N, Cirne W, Brasileiro F (2003) Our grid: An approach to easily assemble grids with equitable resource sharing. 9th Workshop on Job Scheduling Strategies for Parallel Processing, pp 53–68
Atlas A, Bestavros A (1998) Slack stealing job admission control scheduling. Technical Report 1998-009, Boston University
Basney J, Livny M (1999) High performance cluster computing, Prentice Hall PTR, vol. 1, chapt. 5.
Brune M, Gehring J, Keller A, Reinefeld A (1999) Managing clusters of geographically distributedhigh-performance computers. Concurrency–-Practice and experience, 11(15):887–911
Chaplin S, Katramatos D, Karpovich J, Grimshaw A (1999) Resource management in legion. Future Generation Computer Systems 15(5–6):583–594
Davis RI, Tindell KW, Burns A (1993) Scheduling slack time in fixed priority preemptive systems. In IEEE Real-Time Systems Symposium, IEEE Computer Society Press, pp 222–231
Epema D, Livny M, Dantzig RV, Evers X, Pruyne J (1996) A worldwide flock of condors: Load sharing among workstation clusters. Future Generation Computer Systems 12:53–65
Ernemann C, Hamscher V, Streit A, Yahyapour R (2002) Enhanced algorithms for multi-site scheduling. GRID 2002, pp 219–231
Frey J, Tannenbaum T, Foster I, Livny M, Tuecke S (2002) Condor-G: A computation management agent for multi-institutional grids. Cluster Computing 5:237–246
Islam M, Balaji P, Sadayappani P, Pandai DK (2003) QoPS: A QoS based scheme for parallel job scheduling. In Job Scheduling Strategies for Parallel Processing: 9th International Workshop
Kleban S, Clearwater S (2003) Quelling queue storms. In 13th International Conference High-performance and Distributed Computing
LSF Website. http://www.platform.com/products/LSF/
Ramos-Thuel S, Lehoczky J (1993) On-line scheduling of hard deadline aperiodic tasks in fixed-priority systems. Real-Time Systems Symposium
Ramos-Thuel S, Lehoczky J (1994) Algorithms for scheduling hard aperiodic tasks in fixed-priority systems using slack stealing. Real-Time Systems Symposium
Shan H, Oliker L, Biswas R (2003) Job superscheduler architecture and performance in computational grid environments. In Supercomputing 2003
Skovira J, Chan W, Zhou H, Lifka D (1996) The EASY-loadleveler api project. Job Scheduling Strategies for Parallel Processing, pp 41–47
Sun Grid Engine 5.3 Website. http://wwws.sun.com/software/gridware/sge.html
Talby D, Feitelson DG (1997) Supporting priorities and improving utilization of the ibm sp2 scheduler using slack based backfilling. In 13th Intl. Parallel Processing Symposium, pp 513–517
Tia T, Deng Z, Shankar M, Storch M, Sun J, Wu L, Liu J (1997) Probabilistic performance guarantees for real-time tasks with varying computation times. In Real-Time Technology and Applications Symposium
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xavier, P., Cai, W. & Lee, BS. Workload management of cooperatively federated computing clusters. J Supercomput 36, 309–322 (2006). https://doi.org/10.1007/s11227-006-8300-7
Issue Date:
DOI: https://doi.org/10.1007/s11227-006-8300-7