Workload management of cooperatively federated computing clusters

Xavier, Percival; Cai, Wentong; Lee, Bu-Sung

doi:10.1007/s11227-006-8300-7

Workload management of cooperatively federated computing clusters

Published: June 2006

Volume 36, pages 309–322, (2006)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Percival Xavier¹,
Wentong Cai¹ &
Bu-Sung Lee¹

41 Accesses
5 Citations
Explore all metrics

Abstract

Cooperative resource sharing enables distinct organizations to form a federation of computing resources. The motivation behind cooperation is that organizations are likely to serve each other by trading unused CPU cycles given the existence of irregular usage patterns of their local resources. In this way, resource sharing would enable organizations to purchase resources at a feasible level while meeting peak computational throughput requirements. This federation results in community grid that must be managed. A functional broker is deployed to facilitate remote resource access within the community grid. A major issue is the problem of correlations in job arrivals caused by seasonal usage and/or coincident resource usage demand patterns. These correlations incur high levels of burstiness in job arrivals causing the job queue of the broker to grow to an extent such that its performance becomes severely impaired. Since job arrivals cannot be controlled, management strategies must be employed to admit jobs in a manner that can sustain a fair level of resource allocation performance at all participating organizations in the community. In this paper, we present a theoretical analysis of the problem of job traffic burstiness on resource allocation performance in order to elicit the general job management strategies to be employed. Based on the analysis, we define and justify a job management strategies for the resource broker to cope with overload conditions caused by job arrival correlations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Altenbernd P, Hansson H (1998) The slack method: A new method for static allocation of hard real-time tasks. Real-Time Systems 15(2):103–130
Article Google Scholar
Andrade N, Cirne W, Brasileiro F (2003) Our grid: An approach to easily assemble grids with equitable resource sharing. 9th Workshop on Job Scheduling Strategies for Parallel Processing, pp 53–68
Atlas A, Bestavros A (1998) Slack stealing job admission control scheduling. Technical Report 1998-009, Boston University
Basney J, Livny M (1999) High performance cluster computing, Prentice Hall PTR, vol. 1, chapt. 5.
Brune M, Gehring J, Keller A, Reinefeld A (1999) Managing clusters of geographically distributedhigh-performance computers. Concurrency–-Practice and experience, 11(15):887–911
Article Google Scholar
Chaplin S, Katramatos D, Karpovich J, Grimshaw A (1999) Resource management in legion. Future Generation Computer Systems 15(5–6):583–594
Article Google Scholar
Davis RI, Tindell KW, Burns A (1993) Scheduling slack time in fixed priority preemptive systems. In IEEE Real-Time Systems Symposium, IEEE Computer Society Press, pp 222–231
Epema D, Livny M, Dantzig RV, Evers X, Pruyne J (1996) A worldwide flock of condors: Load sharing among workstation clusters. Future Generation Computer Systems 12:53–65
Article Google Scholar
Ernemann C, Hamscher V, Streit A, Yahyapour R (2002) Enhanced algorithms for multi-site scheduling. GRID 2002, pp 219–231
Frey J, Tannenbaum T, Foster I, Livny M, Tuecke S (2002) Condor-G: A computation management agent for multi-institutional grids. Cluster Computing 5:237–246
Google Scholar
Islam M, Balaji P, Sadayappani P, Pandai DK (2003) QoPS: A QoS based scheme for parallel job scheduling. In Job Scheduling Strategies for Parallel Processing: 9th International Workshop
Kleban S, Clearwater S (2003) Quelling queue storms. In 13th International Conference High-performance and Distributed Computing
LSF Website. http://www.platform.com/products/LSF/
Ramos-Thuel S, Lehoczky J (1993) On-line scheduling of hard deadline aperiodic tasks in fixed-priority systems. Real-Time Systems Symposium
Ramos-Thuel S, Lehoczky J (1994) Algorithms for scheduling hard aperiodic tasks in fixed-priority systems using slack stealing. Real-Time Systems Symposium
Shan H, Oliker L, Biswas R (2003) Job superscheduler architecture and performance in computational grid environments. In Supercomputing 2003
Skovira J, Chan W, Zhou H, Lifka D (1996) The EASY-loadleveler api project. Job Scheduling Strategies for Parallel Processing, pp 41–47
Sun Grid Engine 5.3 Website. http://wwws.sun.com/software/gridware/sge.html
Talby D, Feitelson DG (1997) Supporting priorities and improving utilization of the ibm sp2 scheduler using slack based backfilling. In 13th Intl. Parallel Processing Symposium, pp 513–517
Tia T, Deng Z, Shankar M, Storch M, Sun J, Wu L, Liu J (1997) Probabilistic performance guarantees for real-time tasks with varying computation times. In Real-Time Technology and Applications Symposium

Download references

Author information

Authors and Affiliations

School of Computer Engineering, Nanyang Technological University, Nanyang Avenue, Singapore, 639798
Percival Xavier, Wentong Cai & Bu-Sung Lee

Authors

Percival Xavier
View author publications
You can also search for this author in PubMed Google Scholar
Wentong Cai
View author publications
You can also search for this author in PubMed Google Scholar
Bu-Sung Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wentong Cai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xavier, P., Cai, W. & Lee, BS. Workload management of cooperatively federated computing clusters. J Supercomput 36, 309–322 (2006). https://doi.org/10.1007/s11227-006-8300-7

Download citation

Issue Date: June 2006
DOI: https://doi.org/10.1007/s11227-006-8300-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Workload management of cooperatively federated computing clusters

Abstract

Access this article

Similar content being viewed by others

Multi-resource Aware Fairsharing for Heterogeneous Systems

Metascheduling Strategies in Distributed Computing with Non-dedicated Resources

Heuristic Rules for Coordinated Resources Allocation and Optimization in Distributed Computing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Workload management of cooperatively federated computing clusters

Abstract

Access this article

Similar content being viewed by others

Multi-resource Aware Fairsharing for Heterogeneous Systems

Metascheduling Strategies in Distributed Computing with Non-dedicated Resources

Heuristic Rules for Coordinated Resources Allocation and Optimization in Distributed Computing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation