Abstract
DAG has been extensively used in Grid workflow modeling. Since Grid resources tend to be heterogeneous and dynamic, efficient and dependable workflow job scheduling becomes essential. It poses great challenges to achieve minimum job accomplishing time and high resource utilization efficiency, while providing fault tolerance. Based on list scheduling and group scheduling, in this paper, we propose a novel scheduling heuristic called DAGMap. DAGMap consists of two phases, namely Static Mapping and Dependable Execution. Four salient features of DAGMap are: (1) Task grouping is based on dependency relationships and task upward priority; (2) Critical tasks are scheduled first; (3) Min-Min and Max-Min selective scheduling are used for independent tasks; and (4) Checkpoint server with cooperative checkpointing is designed for dependable execution. The experimental results show that DAGMap can achieve better performance than other previous algorithms in terms of speedup, efficiency, and dependability.
References
Ramakrishnan A, Singh G, Zhao H, Deelman E, Sakellariou R, Vahi K, Blackburn K, Meyers D, Samidi M (2007) Scheduling data intensive workflows onto storage-constrained distributed resources. In: Proceedings of the 7th IEEE symposium on cluster computing and the grid (CCGrid’07), 2007
Amin K, Hategan M, Laszewski GV, Zaluzec NJ, Hampton S, Rossi A (2004) GridAnt: a client-controllable grid workflow system. In: Proc 37th Hawai’i international conf on system science, 2004
Malewicz G, Foster I, Rosenberg AL, Wilde M (2007) A tool for prioritizing DAGMan jobs and its evaluation. J Grid Comput 5(2):197–212
Foster I (2005) Globus toolkit version 4: software for service-oriented systems. In: Lecture notes in computer science. vol 3779. Springer, Berlin
The Condor Project website (2007) Available: http://www.cs.wisc.edu/condor/
You SY, Kim HY, Hwang DH, Kim SC (2004) Task scheduling algorithm in GRID considering heterogeneous environment. In: Proc of the international conference on parallel and distributed processing techniques and applications (PDPTA ’04), Nevada, USA, 2004, pp 240–245
Mandal A, Kennedy K, Koelbel C, Marin G, Mellor-Crummey J, Liu B, Johnsson L (2005) Scheduling strategies for mapping application workflows onto the grid. In: IEEE international symposium on high performance distributed computing (HPDC’05), 2005
Dong F, Akl SG (2006) Scheduling algorithms for grid computing: state of the art and open problems. Technical Report No. 2006-504, School of Computing, Queens University Kingston, Ontario
Topcuoglu H, Hariri S, Wu M (2002) Performance effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274
Muthuvelu N, Liu J, Soe NL, Venugopal SR, Sulistio A, Buyya R (2005) A dynamic job grouping-based scheduling for deploying applications with fine-grained tasks on global grids. In: Proc 3rd Australasian workshop on grid computing and e-research, Australia, 2005
Sakellariou R, Zhao H (2004) A hybrid heuristic for DAG scheduling on heterogeneous systems. In: Proc 13th heterogeneous computing workshop, USA, 2004
Maheswaran M, Siegel HJ (1998) A dynamic matching and scheduling algorithm for heterogeneous computing systems. In: Proc 7th heterogeneous computing workshop, 1998
Etminani K, Naghibzadeh PM (2007) A Min-Min Max-Min selective algorithm for grid task scheduling. In: Proc 3rd IEEE/IFIP international conference in Central Asia, 2007
Braun TD, Siegel HJ, Beck N, Boloni LL, Maheswaran M, Reuther AI, Robertson JP, Theys MD, Yao B (2001) A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J Parallel Distrib Comput 61(6):810–837
Buyya R, Murshed M (2002) GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. J Concurr Comput Pract Exp (CCPE) 1175–1220
Hall R, Rosenberg AL, Venkataramani A (2007) A comparison of DAG-scheduling strategies for internet-based computing. In: Proc 22nd international parallel and distributed processing symposium (IPDPS), 2007
Sahoo RK, Oliner AJ, Rish I, Gupta M, Moreira JE, Ma S, Vilalta R, Sivasubramaniam A (2003) Critical event prediction for proactive management in large-scale computer clusters. In: Proc of the ACM SIGKDD, international conference on knowledge discovery and data mining, 2003, pp 426–435
Liang Y, Zhang Y, Jette M, Sivasubramaniam A, Sahoo RK (2006) Blue gene/l failure analysis and prediction models. In: Proc of the international conference on dependable systems and networks (DSN), 2006
Adam JO, Larry R, Ramendra KS (2006) Cooperative checkpointing: a robust approach to large-scale systems reliability. In: Proc of the 20th annual international conference on supercomputing, 2006
Michael L, Todd T, Jim B, Miron L (1997) Checkpoint and migration of UNIX processes in the condor distributed processing system. University of Wisconsin-Madison Computer Sciences Technical Report 1346
Stellner G (1996) Cocheck: checkpointing and process migration for MPI. In: Proc of the international parallel processing symposium, 1996
Sudakov OO, Meshcheriakov IS, Boyko YV (2007) CHPOX: transparent checkpointing system for Linux clusters. In: Intelligent data acquisition and advanced computing systems: technology and applications (IDAACS 2007), 2007, pp 159–164
Maoz T, Barak A, Amar L (2008) Combining virtual machine migration with process migration for HPC on multi-clusters and grids. In: IEEE Cluster, Tsukuba, 2008
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cao, H., Jin, H., Wu, X. et al. DAGMap: efficient and dependable scheduling of DAG workflow job in Grid. J Supercomput 51, 201–223 (2010). https://doi.org/10.1007/s11227-009-0284-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-009-0284-7